AVBH: Asymmetric Learning to Hash with Variable Bit Encoding

Nearest neighbour search (NNS) is the core of large data retrieval. Learning to hash is an effective way to solve the problems by representing high-dimensional data into a compact binary code. However, existing learning to hash methods needs long bit encoding to ensure the accuracy of query, and long bit encoding brings large cost of storage, which severely restricts the long bit encoding in the application of big data. An asymmetric learning to hash with variable bit encoding algorithm (AVBH) is proposed to solve the problem.+e AVBH hash algorithm uses two types of hash mapping functions to encode the dataset and the query set into different length bits. For datasets, the hash code frequencies of datasets after random Fourier feature encoding are statistically analysed. +e hash code with high frequency is compressed into a longer coding representation, and the hash code with low frequency is compressed into a shorter coding representation. +e query point is quantized to a long bit hash code and compared with the same length cascade concatenated data point. Experiments on public datasets show that the proposed algorithm effectively reduces the cost of storage and improves the accuracy of query.


Introduction
Given a query object/point q and a dataset S, the nearest neighbour search (NNS) [1][2][3] is to return the nearest neighbours in S to q. Nowadays, the NNS is widely used in many applications such as image retrieval, text classification, and recommendation systems. However, with the exponential growth of data scale and the disaster of the high data dimensionality, the NNS problem is now much more difficult to solve than before. erefore, new efficient index structures and query algorithms for similarity searches have increasingly become the focus of research for the problem. e hashing-based NNS methods [3][4][5] have attracted much attention. Generally, the hashing methods can project the original data with locality preserved to a low-dimensional Hamming space, i.e., binary codes [4][5][6]. e complexity of those methods is always in sublinear time. In addition, the hashing methods only need a simple bit operation to compute the similarity from Hamming encoding, which is very fast. As the high performance in large-scale data retrieval, hashing techniques have gained increasing interests in facilitating cross-view retrieval tasks [7,8], online retrieval tasks [9], and metric learning tasks [10].
For large-scale data retrievals, the time and space costs are the two important issues. As we know, the accuracy of existing hash methods is limited by the length of hash encoding and usually requires a longer coding to get better accuracy. However, a long coding will increase the space cost, network communication overhead, and response time.
In order to solve this problem, a coding quantization mechanism [11] based on asymmetric hashing algorithm [12] was proposed. Different from the direct hash code comparison, by cascade concatenating the coding of the data point to the same encoding length of the query point, the coding storage cost of the dataset is reduced effectively and the accuracy of the result is ensured. However, this algorithm uses a unified compression method for all data, ignoring the effect of data distribution. Actually, the distribution of large-scale data is generally uneven. Hence, for most hashing algorithms, the frequency of quantization is also different. As we know, longer encoding can preserve most of the original information; however, it will bring high cost and vice versa. A careful trade-off among accuracy, computing overhead, and space-saving needs to be studied. Intuitively, high-density data require longer length encoding to ensure that the original information is preserved as much as possible, while low-density data can use shorter length encoding and still preserve most of the original information.
at is the idea behind our algorithm. In this paper, an asymmetric learning to hash with variable bit encoding algorithm (AVBH) is proposed. e AVBH uses two types of hash mapping functions to quantify the dataset and the query set separately to encode the hash codes with different length bits. In particular, the frequency of dataset is calculated by random Fourier encoding, and then the random Fourier coding with high frequency is compressed into a longer hash code representation, and the random Fourier coding with low frequency is compressed into a shorter hash code representation. e main contributions of this paper are as follows: (1) a variable bit encoding mechanism (named AVBH) based on hash code frequency compression is proposed, which makes the encoding space effectively used, and (2) the experiment shows that the AVBH can effectively reduce the storage cost and improve the query accuracy.

Preliminaries and Description
In this section, we review some basic knowledge of LSH (locality-sensitive hashing) [13][14][15], vector quantization [16], and product quantization [17] that is essential to our proposed technique.

Vector Quantization.
Vector quantization (VQ) is a classical data compression technique, which compresses the original data into discrete vectors. For a vector x of n dimensions, formally, a VQ function f can specified as f(x) ∈ C � c i , i � 1, 2, . . . , k , where x (with n dimensions) is an original data/vector, C is a pretrained code set, and c i is a codeword in the codebook C. e objective of a VQ function is to quantify the original real number vector to the nearest codeword with the lowest VQ loss. Here, the VQ loss of vector x is given by (1)

Product Quantization.
Product quantization (PQ) is an optimization of vector quantization. Firstly, the feature space is divided into m mutually exclusive subspaces, and each subspace is then quantized separately using VQ. at is, the coding of each subspace forms a small codebook C 1 , C 2 , . . . , C m , and small codebooks form a large codebook C by the Cartesian product. In this method, a high-dimensional data can be decomposed into m low-dimensional spaces and can be processed in parallel. Suppose an object x is represented as a combination of m codewords c 1 , c 2 , . . . , c m , the loss of the product quantization of vector x is given by (2) 2.3. Random Fourier Feature. Traditional dimensionality reduction methods, such as PCA, map the data to the independent feature space and compute the main independent features. is method ignores the nonlinear information of the sample distribution and cannot apply to the actual data well. Based on the feature mapping method of random Fourier feature (RFF), data are mapped to the characteristic space under the approximate kernel function, and the inner product of any two points under the feature space is approximated by their kernel function values. Compared with the PCA method, RFF can maximize the data distribution information and obtain the dimensional characteristic by reducing the dimension or raising the dimension. is kind of characteristic is suitable for the characteristic compression processing. SKLSH [18] is a kind of classical hashing algorithm based on RFF, which has a good experimental result under the long bit digit coding. e length coding hash learning algorithm firstly maps the sample points from the original n dimension real space to the n dimension of the approximate kernel function feature space by RFF. Because of the convergence of RFF consistency, the kernel function similarity between any two sample points can be maintained.
Specifically, for two points x, y, the translation invariant kernel function [12] K(x, y) � E(Φ w,b (x), Φ w,b (y)) satisfies the following equation: satisfies the uniform distribution between [0, 2π], w obeys the probability distribution P K induced by the translation invariant kernel function, and η is a constant parameter.
us, the mapping from the n dimensional space to the feature space of the d dimensional approximation kernel function can be obtained by the following equation: where w 1 , w 2 , . . . , w d is for the same-sense sampling subject to the probability distribution P K and b 1 , b 2 , . . . , b d obeys the same-distribution sampling which is uniformly distributed between the obedience [0, 2π]. When the translation invariant kernel function is a Gaussian kernel function, K(x − y) � e − (y/2)‖x− y‖ 2 , P K is a Gaussian distribution, i.e., P K ∼ Normal(0, cI n×n ).

Orthogonal Procrustes
Problem. An orthogonal Procrustes problem is to solve an orthogonal transforming matrix O, so that PO is as close to Q as possible, i.e., where O T O � I. is formula is not easy to be solved directly, and it can be optimized by alternating optimization. Namely, the matrix P is first to be fixed, and the matrix Q is optimized to make the target function value reduced. en the matrix Q is fixed, and the orthogonal transforming matrix O is optimized to make the target function value reduced.

Algorithm Framework.
For a general hash learning algorithm, the length of the hash code by learning is always fixed. AVBH uses the idea of asymmetric hashing algorithm, that is, the hash code for the dataset is short and unfixed, and the query point of the code is long and fixed. e steps of the AVBH hashing algorithm are shown in Figure 1, which mainly includes the dataset encoding steps ①-③ and the query point encoding step ④. e dataset encoding section consists of two phases: random Fourier feature encoding (RFF encoding) and variable bit encoding (AVBH encoding). First, step ① uses the random Fourier feature (RFF) to map the dataset and get RFF encoding. After RFF coding, considering the difference of RFF coding frequency, the RFF coding frequency is sorted in step ②. According to the requirement, the original dataset can be divided into the subset by the RFF code as the length of k 1 , k 2 , . . . , k L shown in the figure. As shown in step ③, the AVBH subset encoding of the length of k 1 , k 2 , . . . , k L can be reproduced by duplicating (n/k 1 ), (n/k 2 ), . . . , (n/k L ) times sequentially and then the Hamming code of n dimension is formed.
In the query point encoding section, the query point quantization is encoded into RFF encoding of length n by step ④.

Objective Function.
e target of the AVBH method is to get L groups of hash encoding with the length of k 1 , k 2 , . . . , k L through the hash function G(x), namely, where N � L l�1 N l , n � L l�1 k l . is divides the dataset B (1) , B (2) , . . . , B (L) into subset according to the RFF encoding frequency. By cascading (n/k 1 ), (n/k 2 ), . . . , (n/k L ) times, respectively, we can get L group n bit hash code. For example, en combine B (1) , B (2) , . . . , B (L) to get a n bit long hash e AVBH method calculates the similarity by calculating the Hamming distance between the hash code of the query point and the concatenated dataset during the query process. erefore, for the dataset, we need to construct the hash mapping function, so that the L group hash code obtained with the length of k 1 , k 2 , . . . , k L , respectively, can preserve the original information as much as possible. erefore, the AVBH method obtains the hash mapping function by the reconstruction error (8) between the minimum cascading encoding B and n dimension sample vector Y: where R is an orthogonal rotation n × n matrix, namely, Combining the properties of associative matrices and the definition of F-norm of matrices, we can get the following equation: As the unknown variable B, R in formula (8) is the product relation, the expanded formula (9) contains two items of unknown variables, so it is difficult to solve. After further simplification, we can get the following formula: As B ∈ +1, − 1 { } n×N , it is easy to get tr(B T B) � nN. As R T R � I, we can get that Y is unrelated to B and R. As a result, tr(Y T Y) � c, where c is unrelated to B. So formula (10) is simplified as follows: us, minimizing the reconfiguration error (8) equals minimizing the quantization error (11). e objective function of AVBH to encode the dataset is to minimize the reconstruction error of the concatenated encoding of the n bit by finding the orthogonal rotation matrix R. In extreme cases of the dataset which is uniformly distributed, there is no significant difference in the frequency of the hash encoding of the dataset, and the AVBH method then degenerates into the ACH algorithm [16]. Compared with the ACH hashing algorithm, the AVBH hashing algorithm is more adaptable to the real data because it can adapt to the data of various distributions and the generalization ability is stronger.

Optimization Algorithm.
e objective function (11) can be optimized by alternating optimization. Namely, the rotation matrix R is first to be fixed, and the encoding matrix B is optimized to make the target function value reduced.
en the encoding matrix B is fixed and the rotation matrix R is optimized to make the target function value reduced. In this way, the value of the target function decreases until it converges. e following is a discussion of how to tune and optimize the value of the target function.
(1) Fix the rotation matrix R, and optimize the encoding matrix B. Given V � R T Y, V lm is a matrix consisted of (m − 1) × k + 1 row to m × k row, (l − 1) × k + 1 column to l × k column of V. From formula (11), we can get the following equation: As n, N, c are unrelated to B, for a fixed R, the problem of minimizing (12) is equal to the problem of maximizing the following formula: As B (l) ij ∈ +1, − 1 { }, optimal analytic solution of formula (13) is given by (2) Fix the encoding matrix B, and optimize the rotation matrix R.
Under R T R � RR T � I, the problem of minimizing formula (11) is equal to an Orthogonal Procrustes problem [9]. e optimal solution of such problems with R is as follows: Hence, the problem of optmizing R to get the minimum value of formula (15) is equal to the problem of maximizing the following formula: By calculating the SVD of YB T , we can get the following formula: where U is a matrix which consists of left singular value vector, C is a matrix which consists of right singular value vector, Ω is a diagonal matrix which consists of corresponding singular value vectors, and the diagonal elements of which is Ω ii ≥ 0, i ∈ [1, n]. By combining formulas (16) and (17), we can get the following equation: Given A � (RC) T U, R � RC, A ii is the diagonal element of A, and R i , U i , respectively, represent the i − th row of R, U. By Cauchy-Schwarz inequalities [11], the following equation is obtained: So formula (18) can be written as formula (20): Combining formula (19), when R i � U i , formula (20) takes the maximum value. For R i � U i , we can get the following formula:   Scientific Programming when R � UC T , formula (16) takes the maximum value, and formula (15) takes the minimum value. As a result, we can get the optimal result by formula (21).

Dataset Encoding.
When the objective function value converges, we can get the mapping function G(y 1 ) of AVBH to the dataset according to formula (14), where y is the random Fourier feature (RFF) obtained by the mapping stage of the sample point x:

Query Point
Encoding. e optimal rotation matrix R can be obtained from the training process of dataset coding. For the data q in the query set, the main goal of encoding is to keep as much accurate information as possible, so the query set encoding does not need to be compressed and mapped to the hash code of the length bit. Combining formula (14), we can get the mapping function of AVBH to the query set:

Convergence Analysis of AVBH.
According to the objective function (8), we can get the following formula: where D is a n × N constant matrix and satisfies the following two conditions: (1) signs of each element, i.e., positive or negative, in D are the same as that in R T Y, and (2) each element in D is not greater than the corresponding position element in R T Y. erefore, the optimization goal for Loss(B, R) is transformed into the two suboptimization problems, i.e., Loss 1 (B) and Loss 2 (R). Specifically, for the subproblem Loss 1 (B), formula (14) gives the optimal solution. erefore, it can be guaranteed that the updated value of formula (14) is less than or equal to the value obtained before. For the subproblem Loss 2 (R), formula (21) gives the optimal solution. erefore, it also can be guaranteed that the updated value of formula (21) is less than or equal to the value obtained before.
Combining the two parts, the combination of (14) and (21) can guarantee that the updated value is less than equal to the value obtained before the update. We can conclude the AVBH algorithm is convergence.

Experimental Datasets
It is a set of 60,000-32 × 32 colour images in 10 categories with each category containing 6,000 images. In this experiment, 320-D gist features were extracted for each image in the dataset. We randomly selected 1,000 images as the test data and the remaining 59,000 as the training data. In the training data, the closest 50 data points (based on the Euclidean distance) from a test data point were regarded as its nearest neighbours.

SIFT 2 .
It is a local SIFT feature set containing 1,000,000 128-D images. 100,000 of these sample images were randomly selected as the training data and 10,000 of other sample images as the test data. 3 . It is a global GIST feature set containing 1,000,000 960-D images. 500,000 of these sample images were randomly selected as the training data and 1,000 sample images as the test data.

Performance Evaluation.
e performance of AVBH was evaluated mainly by the relationship between the accuracy of the query (precision) and the recall rate (recall). We define For the sake of fairness, the average encoding length of AVBH was set to be the encoding length of other methods under a given dataset. Figures 2-4 show precision-recall curves for Euclidean neighbour retrieval for several methods on CIFAR-10, SIFT, and GIST with Euclidean neighbour ground truth.

Scientific Programming
Our method, AVBH, can get a better precision performance on the four datasets. As the AVBH algorithm uses the variable bit code, the total code length is less than other algorithms. As a result, our method effectively reduces the cost of storage and improves the accuracy of query.

Conclusion
In this paper, an asymmetric learning to hash with variable bit encoding algorithm was proposed. By the frequency statistics of the random Fourier feature encoding for the dataset, we compress high-frequency hash codes into longer encoding representations and low-frequency hash codes into shorter encoding representations. For a query data, we quantize to a long bit hash code and compare with the same length cascade concatenated data point to retrieve the nearest neighbours. is ensures that the original data information can be preserved as much as possible while the data are compressed, which maximizes the balance between coding compression and query performance. Experiments on open datasets show that the proposed algorithm effectively reduces the cost of storage and improves the accuracy of the query. As we use a two-stage algorithm framework for the hash codes generating, the training stage costs a lot of time. In future work, we will work on simplifying the training process.

Data Availability
e datasets for the experiment of this paper are as follows.
(1) CIFAR-10 (available at http://www.cs.toronto.edu/∼kriz/ cifar.html): it is a set of 60,000-32 × 32 colour images in 10 categories with each category containing 6,000 images. In this experiment, 320-D gist features were extracted for each image in the dataset. We randomly selected 1,000 images as the test data and the remaining 59,000 as the training data. In the training data, the closest 50 data points (based on the Euclidean distance) from a test data point were regarded as its nearest neighbours. (2) SIFT (available at http://corpustexmex.irisa.fr): it is a local SIFT feature set containing 1,000,000 128-D images. 100,000 of these sample images were randomly selected as the training data and 10,000 of other sample images as the test data. (3) GIST (available at http://corpus-texmex.irisa.fr): it is a global GIST feature set containing 1,000,000 960-D images. 500,000 of these sample images were randomly selected as the training data and 1,000 sample images as the test data.

Conflicts of Interest
e authors declare that they have no conflicts of interest.