Discriminative Similarity-Balanced Online Hashing for Supervised Image Retrieval

When virtualizing large-scale images of the real world, online hashing provides an efficient scheme for fast retrieval and compact storage. It converts high-dimensional streaming data into compact binary hash codes while saving the structural characteristics between samples into the Hamming space. Existing works usually update the hashing function based on the similarity between input data, or design a codebook to assign code words for each single input sample. However, assigning code words to multiple samples while retaining the balanced similarity of the image instances is still challenging. To address this issue, we propose a novel discriminative similarity-balanced online hashing (DSBOH) framework in this work. In particular, we first obtain the Hadamard codebook that guides the generation of discriminative binary codes according to label information. 'en, we maintain the correlation between the new data and the previously arrived data by the balanced similarity matrix, which is also generated by semantic information. Finally, we joined the Hadamard codebook and the balanced similarity matrix into a unified hashing function to simultaneously maintain discrimination and balanced similarity. 'e proposed method is optimized by an alternating optimization technique. Extensive experiments on the CIFAR-10, MNIST, and Places205 datasets demonstrate that our proposed DSBOH performs better than several state-of-the-art online hashing methods in terms of effectiveness and efficiency.


Introduction
With the widespread use of digital monitoring facilities and the Internet, the generated streaming data have also increased correspondingly [1][2][3][4]. e processing of streaming data needs to be performed in approximately real time, which is very difficult for high-dimensional multimedia data such as images and videos [5,6]. Online hashing can encode high-dimensional streaming data that arrive online into compact binary codes with low storage and efficient computation [7,8]. In particular, it preserves the relationship among the samples into the Hamming space and updates the hashing function in the light of the newly arrived data to adapt to the new data instance [9,10]. In view of the advantages of low storage and efficient computation, online hashing is widely applied in education, finance, military, among other industries [11][12][13][14].
Most existing online hashing methods have been devoted to the trade-off between accuracy and efficiency [15][16][17]. According to the learning strategy, people divide these techniques into unsupervised online hashing and supervised online hashing [18][19][20]. e well-known unsupervised methods mainly include online sketch hashing (SketchHash) [21], FasteR online sketch hash (FROSH) [22], and zero-mean sketch [23]. SketchHash designs the hashing function with the sketch scheme [21]. FROSH uses the independent subsampling random Hadamard transform on various small data blocks to get a compact and accurate sketch while speeding up the sketching procedure [22]. e zero-mean sketch method solves the uncertainty problem of the offset value and improves the data processing efficiency by zero-mean sketch [23]. Supervised methods obtain better performance than unsupervised methods in most instances because of the utilization of label information. Some representative works include online hashing (OKH) [24,25], adaptive hashing (AdaptHash) [26], online supervised hashing (OSH) [27,28], online hashing with mutual information (MIHash) [29], balanced similarity for online discrete hashing (BSODH) [30], and Hadamard codebookbased online hashing (HCOH) [31]. ese methods have achieved satisfactory performance.
However, some existing supervised online hashing methods still achieve unsatisfactory accuracy in real applications as they ignore any discriminative and balanced similarity. More specifically, HCOH generates discriminative binary codes with maximum information entropy by the Hadamard codebook, but ignores the local neighbor relationship among samples and only processes a single input. On the other hand, BSODH only considers balanced similarity based on the pairwise relationship and neglects the global data distribution, which results in a decrease in accuracy [32,33]. Hence, both HCOH and BSODH have problems when applied to real applications.
In this work, we put forward a novel discriminative similarity-balanced online hashing (DSBOH) framework, which can simultaneously preserve the global distribution information of data and pairwise relationships between samples to generate discriminate hash codes with maximum information entropy. In particular, first, we maintain the maximum information entropy of hash codes via a Hadamard codebook. en, the pairwise similarity matrix is adjusted to ensure that the updated scheme of balanced hash codes is used to preserve the correlation between the new and existing data. Finally, we combine the above attributes into a unified hashing function. An alternating iterative algorithm is used to solve the proposed DSBOH method. Compared with several state-of-the-art online hashing techniques, remarkable results have been achieved by our proposed DSBOH method.
In summary, the main contributions of this work include the following: (i) e Hadamard matrix is used to ensure that the hash codes with maximum information entropy are separable and can deal with situations with unknown number of categories. (ii) We preserve the balanced similarity between newly arrived data and previously arrived data into the generated Hamming space using the inner product to deal with uneven data distribution. (iii) We combine the Hadamard codebook and the balanced similarity matrix into a unified hashing function to simultaneously maintain the discrimination and balanced similarity of the hashing modal. (iv) e alternating iterative algorithm is used to optimize the proposed method, and experimental results verify that our method performs much better than several state-of-the-art online hashing techniques.
e remainder of this study is organized as follows. Section 2 gives a brief overview of the related works. In Section 3, we elaborate on the framework and optimization of the proposed method. Section 4 details the experimental results and analyses. Finally, we conclude the paper in Section 5.
Huang et al. first proposed a prototype based on online hashing termed OKH [24]. In each current iteration, a new pair of data samples is used, a pair of sample similarity loss functions is designed according to the Hamming distance, and the prediction loss referring to Ref. [34] is used. e function evaluates whether the operating hashing projection vector suits the new data and expects the model to save as much of the historical information of the previous round of projection vectors as possible during the update process. To make the original online hashing algorithm more perfect in loss function theory, an improved weakly supervised online hashing learning model [25], which does not require the label information of the data, is proposed for the loss threshold of the hashing modal. e new objective function is designed to calculate the disparity between the Hamming distances of pairwise data, and the upper limit loss of the online hash theory is rigorously analyzed. Second, because the hashing function learned in the algorithm update relies on new data, it easily falls into local deviations according to the characteristics of online hash algorithms to adapt to the new data; a multimodal strategy is produced to reduce such deviations.
Cakir et al. proposed the OSH method [27], which adapts to data changes, and the label types of datasets are unknown. A random method is used to generate the codebook, so that the code generated by the hashing function and the category matching error in the corresponding codebook are minimized [35]. To ensure the last round of information, the previous hashing functions are linearly combined and superimposed; however, the codebook structure directly determines the coding efficiency. erefore, in the follow-up literature, an improved online supervised hashing [28] is proposed for this problem. e ECOC codebook is applied according to online supervised hashing, which improves the space efficiency and solves the original Hamming loss formula. Complexity proposes an efficient solution method based on the upper boundary, which improves the time efficiency of the algorithm.
AdaptHash [26] uses the relationship between data sample similarity and Hamming distance to solve the problem of how online models adapt to current data. First, the objective function is constructed using the similarity of current sample pairs and the Hamming distance relationship combined with the minimum loss variance function [36], and the gradient descent algorithm is used to solve the hash projection vector; the objective function is further generalized to make similar or unsimilar sample data pairs. e Hamming distance is minimized (maximized) to reduce the update redundancy caused by the update mechanism; finally, the hinge loss function [37] is used to filter the hash map with the largest error, and the iterative calculation is reentered until the number of iterations reaches the set value.
MIHash [29] adopts the theory of quantitative information coding to obtain high-quality hash code that eliminates unnecessary hash table updates. e mutual information between the dataset samples is well correlated with standard evaluation indicators and is used to calculate the information entropy. When optimizing the mutual information target, differentiable histogram merging technology is used to derive stochastic gradient descent-based optimization rules, and finally, the differentiated rules are utilized to merge the derived histograms and apply them to the learning objective function. is work is dedicated to the synchronization of the hash code and the hashing function updates and effectively reduces the reconstruction of the hash table.
BSODH [30] studies the relationship between new data and previously arrived data. is work considers that the problem of online hashing is attributed to two issues: updating imbalance and optimization inefficiency. e above authors recommend asymmetric graph regularization techniques to keep the relevance of online streaming data and previously accumulated datasets. To deal with data imbalance in the learning stage of online hashing, BSODH designs a new balanced similarity matrix between new data and previously arrived data, which tackles the challenge of quantization error brought by relaxation learning in the discrete optimization method in online learning and reveals advanced results compared with the quantization-based schemes.
In addition, some existing offline deep hashing methods [38][39][40][41][42] use deep learning techniques to train the hashing function and map the image data into low-dimensional binary codes to complete the mission of image retrieval, but as the amount of data increases, the retraining model consumes more time whenever new data arrive. For example, deep transfer hashing (DTH) [42] trains a CNN model and inputs the online generated image pairs and their labels into the network. e loss function of the model makes the outputs of similar instances close, while the outputs of dissimilar instances are pushed farther, thus obtaining the binary codes representing the semantic structure of the original image pairs. However, this method requires a complex relaxation process and a relatively large number of bits to obtain satisfactory retrieval results. Figure 1 shows the overall framework of our proposed discriminative similarity-balanced online hashing (DSBOH), which contains two main modules, namely discriminative codebook and balanced similarity. e details of the proposed DSBOH are presented as follows.

Notations. Assume that
. , x t n t ] ∈ R d×n t are fed into the system at the t stage, whose corresponding label L is expressed as e mapping matrix to be learned for reducing the d-dimensional real-valued data X t to k-dimensional binary data B t is represented as W t ∈ R d×k . e expression of B t is defined as follows: where F(·) represents the hashing function, W tT represents the transposition of W t , and sgn(·) is the symbolic function defined as follows: To retain the similarity or dissimilarity relationship between the newly arrived streaming data and the previously arrived data, we consider constructing the hashing function with a similarity matrix. At the t stage, the currently arriving data are defined as X t c � [x t c1 , x t c2 , . . . , x t cn t ] whose corresponding labels are represented as L t c , and the generated hash codes are represented as B t c . e data arriving before the t stage are X t a � [X 1 c , X 2 c , . . . , X t−1 c ] whose corresponding labels are represented as L t a , and the generated hash codes are represented as B t a . All symbol notations utilized in this study are presented in Table 1.

Hadamard Codebook.
To maintain the maximum information entropy of hash codes, we construct a Hadamard codebook in three steps. First, we generate an orthogonal Hadamard matrix that is 2 q -dimensional (q is a positive integer) according to the definition C ij � (−1) (i− 1)(j− 1) , where C ij is the jth element of the ith row in matrix C. e Hadamard matrix can generate independent hash codes that satisfy two principles of the error-correcting output code: the Hamming distance between columns is maximized to ensure a significant difference between classifiers, and the Hamming distance between rows is maximized to have a strong error correction ability. Attention should be paid to guarantee that the dimension of the Hadamard matrix is a bit larger than the number of labels. Second, we assign data from the same class to the same column vector of the Hadamard matrix C to be the target vector in the Hadamard codebook C. In particular, when a batch of new data is received, we randomly and nonrepeatedly select certain columns in the Hadamard matrix to construct virtual multilabel vectors in the Hadamard codebook. When the label of the new data is the same as the data that arrived before, it is assigned to the same column vector. ese vectors are aggregated to form a codebook C. Finally, we use locality-sensitive hashing (LSH) [43] to align the code length of the Hadamard codebook with that of the hash codes.
To maintain the independence of the hash code and retain the global distribution information, we define the loss function L 1 based on the Hadamard codebook as follows: where denotes the label category of x t i , and ‖ · ‖ F is the Frobenius norm of a matrix.

Balanced Similarity.
Suppose that there are two input data x i and x j , the corresponding labels are l i and l j and the hash codes are expressed as , respectively. S ij represents the similarity matrix of x i and x j . If x i and x j belong to one category, that is, l i � l j , then S ij � 1. We expect that the hash codes within the same category are the same; that is, Because the product of the same binary codes is 1, We also expect that the hash codes from different categories are different; that is, In sum, the product of B T i and B j has a common value with kS ij , which means that we can retain the similarity relationship of input data into the Hamming space through the above method as follows: To keep the similarity relationship constructed from the newly arrived data X t c at the t stage and the data X t a before the t stage in the Hamming space, the relationship between the inner product of the binary codes B t c and B t a and similarity S t is used. In addition, with the increase in the new instances, the similarity matrix S t becomes more and more sparse because most image pairs are dissimilar [30]. To prevent the model from overly relying on dissimilar information and ignoring the information of similar pairs, we adjust the similarity matrix according to the similarity and dissimilarity and convert the similarity matrix S t to the balanced similarity matrix S t by multiplying by different balance factors. e balanced similarity matrix S t is defined as follows: Hadamard matrix where m t � t−1 i�1 n i denotes the total number of instances that arrived before t stage.
3.4. Overall Formulation. Different from HCOH and BSODH, which find the global data distribution or balanced similarity via a local neighbor relationship, DSBOH aims to generate discriminative binary codes for single or multiple inputs by preserving global distribution information with the help of Hadamard codebook and local pairwise relationship between the newly arrived data and the previously arrived data in a seamless framework. When the data explode, the modal still has a strong generalization ability because we consider retaining the semantic relationship between the data at different stages. Furthermore, hash codes are independent and discriminative due to the use of codebook. erefore, we combine loss function L 1 of the Hadamard codebook hashing function in equation (3) and loss function L 2 of balanced similarity preservation in equation (6) into the same objective function, which is expressed as follows: where λ t is the parameter to control the importance.
To minimize the quantization error between learned hashing function F(X t ) and the target hash code B t , the quantized loss function is defined as follows: Finally, adding equation (8) into (7), and adding the Frobenius norm of W t as a regular term, the overall formulation is expressed as follows: where σ t and ϵ t are parameters to control the importance of each module.

Alternating Optimization.
Owing to the discrete restrictions of the binary codes, the optimization problem of the variables in equation (9) is nonconvex [44,45]. In this regard, an alternating optimization technique is adopted to deal with our proposed loss function L. at is, when a variable is updated, others are fixed as constants. e specific details of the implementation are introduced as follows.
(1) Solving W t : fix B t a and B t c , so that the first term in equation (9) can be eliminated. e objective function becomes: Replacing the formula F(X t ) � sgn(W tT X t ) in equation (1) with F(X t ) � tanh(W tT X t ) for optimization convenience, we obtain the following: Using the formula of matrix A: We convert equation (11) into the form of the trace of the matrix as follows:

Scientific Programming
After simplification, we obtain the following: where I stands for the d-dimensional identity matrix. Equation (14) takes the partial derivative of W t and makes the result zero. at is: erefore, we update W t with the following equation: (2) Solving B t a : fix W t and B t c ; therefore, only the first term remains in equation (9). e objective function now becomes: According to Ref. [46], the F norm is changed to the L1 norm; the result is as follows: (3) Solving B t c : fix W t and B t a . Equation (9) becomes: For further optimization, we remove irrelevant items and obtain the following: where P � kλ t B t a S t T + σ t W tT X t c . According to supervised discrete hashing (SDH) [6] and BSODH [30], the optimization in equation (20) is NP hard, so we turn the matrix into a combination of row vectors, transferring the problem into row by row updating. at is to say, equation (20) becomes: where b t cr , b t ar , and p r are the rth row of B t c , B t a , and P; B t c , B t a , and P t are the remaining parts of B t c , B t a , and P except for the rth row, respectively. e above formula is expanded to obtain the following: e equation (22) is simplified to obtain the following: erefore, we update row by row according to the following rules: e proposed DSBOH is summarized in Algorithm 1.

Experiments
To prove the effectiveness of DSBOH, extensive experiments on three widely used image datasets are conducted in this section and compared them with several advanced online hashing techniques. [47] is an inclusively applied dataset for image retrieval and classification. It is composed of 60,000 samples selected from ten classes, and each sample is represented by 4096-dimensional CNN features. e ten classes are airplane, automobile, bird, cat, deer, dog, frog, horse, ship, and truck. Each category includes 6,000 samples. We randomly select 5,900 samples from each category as the training set; the remaining images are set as the testing set. From the training set, 20,000 instances are utilized for learning hashing functions [31]. Twenty example images from each category of CIFAR-10 are shown in Figure 2.

Datasets. CIFAR-10
MNIST consists of 70,000 hand-written digital images with 10 categories, which include numbers 0 to 9; each image is represented by a 784-dimensional vector. We randomly sample 100 instances from each class to construct the testing set and make use of the remaining part to compose the training set. 20,000 images randomly selected from the training set are used to learn the hash model [18]. We randomly select 27 example instances from each class to show in Figure 3.
Places205 [48] is a large-scale scene-centric dataset that contains 205 common scene categories and 2.5 million images with labels. First, the fc-7 layer of AlexNet [49] calculates the features of each image, and then, PCA is exploited to simplify these features into 128-dimensional vectors. We stochastically choose 20 images from each category to form the test set, and the others automatically consist of the training set. 100,000 images in the training set are randomly selected to learn the hashing functions. Two hundred randomly picked images of Places205 are shown in Figure 4.

Parameter Settings.
According to experience, the ranges of λ t , σ t , and ϵ t for the proposed DSBOH are set in 0: 0.05: 5 { }. For the CIFAR-10 dataset, the best combination for (λ t , σ t , ∈ t ) is empirically adopted to (0.7, 0.3, 0.8). For the MNIST dataset, we set (0.1, 0.3, 1.2) as the configuration of (λ t , σ t , ∈ t ). For the Places205 dataset, (0.1, 0.8, 0.2) corresponds to (λ t , σ t , ∈ t ). Table 2 shows the detailed parameters of DSBOH on the CIFAR-10, MNIST, and Places205 datasets. In addition, we conducted experiments with hash codes of different lengths from the set [8,16,32,48,64,128]. It is worth mentioning that SketchHash requires the size of a batch greater than that of hash codes [21]. ence, we only show the results of SketchHash under 64 bits.

Evaluation Protocols.
To evaluate the proposed method, we apply a set of widely adopted protocols, which includes the mean average precision (mAP), the average accuracy of the first 1000 retrieved samples (mAP@1000), which is used for the large dataset Places205 to reduce the calculation time, precision within a Hamming sphere with a radius of 2 centered on every query point (Precision@H2), and the average precision of top-R retrieving neighbors (Precision@R). We also compare the running time on CIFAR-10 and MNIST with other methods. Additionally, the precision-recall curves on CIFAR-10 and MNIST are adopted to evaluate our proposed method.

Compared Methods.
We contrast the proposed DSBOH with several advanced online hashing methods, including OKH [25], OSH [28], AdaptHash [26], Sketch-Hash [21], and BSODH [30]. All the results of the above methods are implemented via the publicly available source codes. We implement all the methods using MATLAB on a single computer equipped with a 3.0 GHz Intel Core i5-8500 CPU and 16 GB RAM; all results shown in this work are the average of the three runs.     Table 3. From this table, we can find that (1) mAP: in the case of 16 bits, 32 bits, 48 bits, 64 bits, and 128 bits hash codes, our proposed method has improved the second-best BSODH method by 6.5%, 1.4%, 4.0%, 1.1%, and 1.6%, respectively, and the mAP of DSBOH is slightly lower than that of BSODH. (2) Precision@H2: in the case of 8 bits, 16 bits, and 32 bits, our proposed method is 10.6%, 14.8%, and 4.6% better than the second-best BSODH, respectively. Although the mAP at 48 bits, 64 bits, and 128 bits of our proposed DSBOH slightly decreases compared with BSODH, our DSBOH performs better than other online hashing methods. Table 4 shows the mAP and Precision@H2 results of our raised DSBOH and compared techniques on the MNIST dataset. e consequences indicate that (1) mAP: the proposed DSBOH accomplishes an increase of 0.3%, 2.1%, 1.2%, 0.8%, 1.5%, and 2.1% for mAP compared with the secondbest BSODH in 8 bits, 16 bits, 32 bits, 48 bits, 64 bits, and 128 bits. Hence, the superiorities of DSBOH are demonstrated. (2) Precision@H2: the Precision@H2 of our DSBOH is much better than BSODH by 9.5%, 9.4%, and 2.3% for the 8 bits, 16 bits, and 32 bits, respectively. e performance of our DSBOH is slightly lower than that of BSODH in terms of 48 bits, 64 bits, and 128 bits. e experimental consequences of mAP@1000 and Precision@H2 on the Places205 database are expressed in Table 5. From this table, we can learn that (1) mAP@1000: our proposed DSBOH is 1.3%, 0.5%, and 1.0% better than the second-best BSODH in terms of 48 bits, 64 bits, and 128 bits, respectively, and ranks second in terms of 8 bits, 16 bits, and 32 bits. (2) Precision@H2: the outcome of Precision@H2 for our DSBOH is the highest at 32 bits and 2.3% higher than the second-best method. For other hash bit lengths, DSBOH slightly decreases compared with the best.
For further verification of the performance of our DSBOH, we execute comparative experiments on Preci-sion@R under 16 bits, 32 bits, and 64 bits hash codes on the CIFAR-10 and MNIST datasets. As shown in Figure 5, the proposed approach continuously reveals the best Precision@ R, which demonstrates the superiority of DSBOH. In addition, the precision-recall curves on the CIFAR-10 and MNIST datasets are shown in Figures 6(a) and 6(b), respectively. Both curves wrap more curves, which proves the effectiveness of our algorithm. To clearly show the performance, we calculate the blue area under the curve (AUC) of the PR curves on CIFAR-10 and MNIST and obtain 95.70%     and 97.28% AUCs, respectively, which verifies that our learning model has a double high ratio of precision and recall. Figure 7 presents the training time of our proposed method and compared approaches in terms of 32 bits on the CIFAR-10 dataset and MNIST dataset. As for Figure 7(a), we notice that our proposed DSBOH runs faster than AdaptHash, OSH, and BSODH but is very similar to OKH and SketchHash. We find that in CIFAR-10, although OKH and SketchHash have the shortest training time, their model accuracy is very poor. e training time spent by DSBOH is the shortest among the remaining algorithms, and the training efficiency is the highest. According to Figure 7(b), the training time of every method for comparison exceeds our proposed DSBOH except for SketchHash. erefore, our algorithm is efficient for online image retrieval.

Conclusion
In this study, we bring forward DSBOH as a novel scheme that combines global distribution and balanced similarity to generate discriminative hash codes for image retrieval. To this end, we utilize the Hadamard codebook to assist the construction of the hashing function and keep the similarity between the newly arrived samples and the previously arrived samples from the original real value space into the Hamming space. Vast experiments on three benchmark datasets demonstrated that DSBOH shows significant advantages in effectiveness and efficiency compared with several innovatory online hashing methods. Since we use the codebook to assign code words to single-label images, the problem of code word assignment applied to multilabel image retrieval is worthy of further study. It is also possible to study a codebook that can better store the structure information of the image data in the future.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.