Accumulative Quantization for Approximate Nearest Neighbor Search

To further improve the approximate nearest neighbor (ANN) search performance, an accumulative quantization (AQ) is proposed and applied to effective ANN search. It approximates a vector with the accumulation of several centroids, each of which is selected from a different codebook. To provide accurate approximation for an input vector, an iterative optimization is designed when training codebooks for improving their approximation power. Besides, another optimization is introduced into offline vector quantization procedure for the purpose of minimizing overall quantization errors. A hypersphere-based filtration mechanism is designed when performing AQ-based exhaustive ANN search to reduce the number of candidates put into sorting, thus yielding better search time efficiency. For a query vector, a self-centered hypersphere is constructed, so that those vectors not lying in the hypersphere are filtered out. Experimental results on public datasets demonstrate that hypersphere-based filtration can improve ANN search time efficiency with no weakening of search accuracy; besides, the proposed AQ is superior to the state of the art on ANN search accuracy.


Introduction
Nearest neighbor (NN) search is fundamental and important in many applications, such as machine learning, image classification, content-based image retrieval, deep learning, feature matching [1], and image interpolating [2]. e goal of NN search is to find the closest vector whose distance to the query vector is the smallest among a database according to a predefined distance metric. e natural solution is to perform exact nearest neighbor search, which is inherently expensive for large-scale collects and high dimensional vectors due to the "curse of dimensionality" [3]. is difficulty has led to the development of the solutions to approximate nearest neighbor (ANN) search. e key idea shared by ANN methods is to find the NN with high probability "only," instead of probability 1 [4] by exhaustive search or nonexhaustive search based on index [5][6][7].
Hash-based nearest neighbor search methods map vectors from Euclidean space into hamming space, using binary codes to represent the vectors [8].
e similarity between vectors is measured by the hamming distance between the codes. Such methods include small binary code [3], spectral hashing [9], spherical hashing [10], hamming embedding [11], mini-BOF [12], and K-means Hashing [13]. ese methods make it possible to store large-scale vectors in computer memory and perform nearest neighbor search efficiently. While promising, when the number of bits used for encoding vectors is fixed, the possible number of hamming distances is consequently fixed. erefore, the discrimination of hamming distance is restricted by the length of code.
ere are ANN search methods trying to resolve the nearest neighbor search problem with efficient quantization technology [14] by adopting Euclidean distance which owns better discrimination than hamming distance. As a typical work, product quantization (PQ) is firstly introduced into ANN search [4], where the vector space is decomposed into a Cartesian product of low-dimensional subspaces. A vector is represented by a short code composed of its subspace quantization indices. An asymmetric Euclidean distance is designed to accelerate the approximate distance computation between two vectors. It is proved to be superior to hamming distance in terms of the trade-off between accuracy and search time efficiency. A lot of PQ variants [15][16][17][18][19][20][21][22][23] are studied to improve the performance in different ways, such as optimized product quantization (OPQ) [16], product quantization with dual codebooks [19], Cartesian k-means [20], and Quarter PQ [21].
PQ assumes that each dimension component in vectors is statistically independent of each other, while this is not applicable enough for all real data. Contrast to PQ-based methods which partition vector space into several subspaces, another representative quantization research community mainly focuses on approximating a vector v by using the addition of L centroids c v,l with each selected from one codebook (equal (1)). en, the vector is represented by a short code composed of the indices of L selected centroids. (1) e typical works include addition quantization [24] and composite quantization (CQ) [25]. CQ trains codebooks by introducing near-orthogonal constraint while addition quantization minimizes the quantization errors over each dimension during training codebooks. In contrast, residual vector quantization (RVQ) [26,27] is a sequential multistage quantization technique consisting of several stage-quantizers. Except the first stage, the vectors used to train the stage-codebook are the residual vectors generated from the preceding stage-quantizer. Enhanced RVQ [28] improves the accuracy of approximating a vector by designing a joint optimization to reduce overall quantization errors during training codebooks. Based on RVQ, project residual vector quantization [29] improves the training efficiency by projecting vectors into low-dimensional vector space, while projection-based enhanced residual quantization [30] is based on enhanced RVQ.
In this paper, we propose an accumulative quantization method for ANN search to further improve search accuracy. is paper offers the following contributions: (1) Accumulative quantization is proposed to represent a vector as a sum of L partial vectors which are quantized by L codebooks, respectively. For this, each vector is firstly decomposed into L components of the same dimension as that of original one. en, initial L codebooks are trained on those L partial vector sets independently. To improve the approximation power of codebooks, an optimization is introduced through minimizing the overall error between original vector and the vector reconstructed by accumulative quantization. (2) In the ANN search procedure, to gain good search accuracy, R search results are usually returned. Normally, whether exhaustive search or nonexhaustive search, the candidate vectors are sorted to get the R search results of high probability with the distance between candidates and the query vector. en, the number of candidate vectors restricts the time efficiency of ANN search. Actually, given a query vector, its nearest neighbors only locate near the query in the vector space, so we proposed a hypersphere filtration strategy, which has simple but positive effect on improving search time efficiency. By constructing a hypersphere with each query vector as the center, only the candidates located in the hypersphere are put into sorting. is paper is organized as follows: Section 2 presents accumulative quantization (AQ). An asymmetric distance with uniform scale quantization is described in Section 3. Section 4 introduces a hypersphere-based filtering strategy and the combination with AQ-based exhaustive ANN search. e performance of our approaches and the comparisons with the state of the art are reported in Section 5. Conclusions are discussed in Section 6.

Accumulative Quantization
Given a vector v, accumulative quantization approximates the vector as the sum of L partial vector, where each partial vector is quantized with a pretrained codebook, as follows: where v v,l is the quantization output centroid selected from the lth codebook. en, vector v is represented by the L-tuple indices of centroids corresponding to v l,v . e quantization accuracy can be measured by the difference between v and its reconstructed vector v, denoted with mean squared error (MSE) which can be calculated by e smaller the MSE is, the better the codebooks are. e proposed accumulative quantization aims to minimize MSE in the process of training L codebooks and encoding vectors, respectively.

Codebook Training. Given a training vector set
X � x 1 , x 2 , . . . x N , x i ∈ R d , accumulative quantization initially decomposes each training vector into L partial vector of the same dimension as that of original vector, where x i � L i�1 x i,l . en, the training set is decomposed into L training partial vector sets X(l) � x i,l (l � 1, 2, . . . , L), where x i,l denotes the lth partial vector of the vector x i .

2
Computational Intelligence and Neuroscience Figure 1(a) shows the framework of codebooks training for proposed accumulative quantization, which consists of initial codebooks training and codebooks optimization.

Initial Codebooks Training.
To train the L initial codebooks, k-means algorithm is performed to generate k centroids as the codebook C l � c l,1 , c l,2 , . . . , c l,k on training set X(l).
en, vector x i can be quantized by these L codebooks independently after decomposing this vector into L partial vector x i,l according to where x i,l denotes the quantization output of the lth partial vector x i,l and d(x i,l , c l,j ) denotes the Euclidean distance between x i,l and the jth centroid c l,j in codebook C l . According to formula (3), the training errors can be measured by the mean square Euclidean distance between x i and its reconstructed one x i , which is formulized as where x i − L l�1 x i,l is denoted as e i , representing the overall quantization error of x i . Also, , where e i (l) denotes the quantization error of partial vector x i,l produced by C l .

Codebooks Optimizing.
e objective function of training each codebook above is to minimize the error between x i,l and x i,l in each subvector set, not the MSE in (5); thus, those L codebooks may not be the optimal solution for the whole vectors.
Here, a codebook optimization is designed in an alternative manner, in which each step updates one group of parameters with fixing the others.
Update C l . Fixing {C l ′ , l ′ ≠ l} and {X(l ′ ), l ′ � 1, 2, . . . , L}, the problem is transformed into recomputing the centroids according to vectors x i − L l′ ≠ l x i,l′ � x i,l + e i for the objective of minimizing the MSE. For each vector x i,l + e i , a naive solution is to use x i,l + e i itself as the new centroids, replacing the closest centroid with vector x i,l + e i , so that its MSE can be reduced to 0. However, this strategy may result in significantly increasing number of centroids in C l , so it is not practical. Inspired from k-means, we design a mean mechanism to update each c l,i in C l , where c l,i is recomputed as the mean of vectors whose nearest centroid is c l,i . e formula is showed as where N c l,i denotes the number of vectors whose nearest centroid is c l,i . Update X(l) � x i,l . After optimizing C l , with fixed {C l′ , l ′ ≠ l} and {X(l ′ )}, the lth codebook changes, so the quantization output of X(l) should be updated together. It can be easily seen that the quantization outputs of vectors in X(l) are independent of each other. en, the optimization of X(l) can be decomposed into N suboptimization according to formula (7) with given fixed C l ′ and X(l ′ ), l ′ ≠ l : arg min e codebooks are optimized in iterative manner. One iteration includes the optimization from the 1st to the Lth codebook sequentially. When the objective function value MSE showed in formula (5) converges, the process of codebooks optimization ends.

AQ-Based Vector Quantization.
Given a vector v, vector quantization is supposed to generate an L-tuple containing L centroids by accumulative quantization to approximate v. e indices (binary code) of those L centroids are used as the codes to represent input vector. It can be achieved by respectively selecting one centroid from each codebook C l (l � 1, . . . , L) to minimize the overall quantization error ‖v − L l�1 v l ‖ 2 . A natural way is to compare all the L-tuples and select the best one. However, for each codebook containing k centroids, there are k L comparisons to gain the quantization output. is will greatly weaken the efficiency; thus, it is not practical.
We propose a vector quantization method for accumulative quantization, including 2 procedures: initial quantization and quantization output optimizing, showed in Figure 1 e procedure of initial encoding quantizes v with L quantizers independently after decomposing v into L partial vectors. e procedure of quantization output optimizing uses the overall quantization error e v to sequentially update the l ′ th quantization output from l ′ � 1 to l ′ � L with fixed L codebooks and the other L-1 quantization outputs.

Initial Quantization.
e vector v is firstly decom- en, each accumulative vector v l is quantized by corresponding quantizer Q l according to formula (4).
us, v can be approximated by its reconstructed vector v � L l�1 v l .

Quantization Outputs
Optimizing. Partial vector v l is quantized for the purpose of minimizing the error between v l and v l , which can be measured by ‖v l − v l ‖ 2 . While promising, procedure (1) can simplify the process of quantizing v, but the reconstruct vector L l�1 v l may not be the best one to approximate v. e reason lies in the fact that each v l is obtained considering only minimizing the local quantization error, not the overall quantization error ‖v − L l�1 v l ‖ 2 between v and its reconstructed one v. An iterative optimization is proposed to improve the Ltuple quantization outputs with the L codebooks as constant.

Computational Intelligence and Neuroscience
Like optimizing codebooks, each v l is also optimized in an alternative manner.
Optimize v l . Fixing the other L-1 quantization outputs is computed and taken as the input of the lth quantizer. en, it is quantized according to formula (4), so that the lth quantization output is updated under the condition of minimizing the overall quantization error. e L-tuple quantization output is optimized iteratively from the 1st to Lth sequentially. e iteration stops until the L-tuple quantization outputs do not change. Experiments show that the proposed vector encoding method can rapidly converge in a little number of iterations, showed in Figure 2. Lower quantization brings the benefit that vectors can gain better approximation with fixed L codebooks.

Fast Distance Computation
When performing ANN search, the distance between the query vector q and the vector y i in database needs to be computed, where the quantization output y i of y i is denoted as L-tuple y i,1 , . . . y i,l , . . . y i,L . Based on accumulative quantization, an asymmetric Euclidean distance computing is proposed to accelerate ANN search, which is showed in the following: For query vector q, the term ‖q‖ 2 is a constant for all database vectors and does not affect the ANN search, so it does not need to be computed.
(1) Evaluating the term 〈q, :L l�1 y i,l 〉: the term 〈q, :L l�1 y i,l 〉 can be transformed as L l�1 〈q, y i,l 〉. en, the term 〈q, y i,l 〉 can be obtained from a lookup table, in which the inner product between q and the L × k centroids is precomputed when q is submitted.
(2) Evaluating the term ‖ :L l�1 y i,l ‖ 2 : if it is computed online when a query vector is submitted, the ANN search time efficiency will be inevitabley decreased. While promising, evaluating ‖ :L l�1 y i,l ‖ 2 can be transformed into   Computational Intelligence and Neuroscience computing L l�1 L l′�1 〈y i,l , y i,l ′ 〉 which can also be obtained by constructing L 2 /2 look-up tables of k × k size, but the computation cost is large [25]. Another way is to compute the length ‖ :L l�1 y i,l ‖ 2 � ‖y i ‖ 2 of reconstructed vector y i offline and store it in a look-up table when quantizing y. However, each database vector needs 4 bytes to store ‖y i ‖ 2 .
Here, a simple uniform scalar quantization is designed to encode ‖y‖ 2 with several binary bits, named length bits. For example, if it is planned to take 1 byte to store ‖y i ‖ 2 , ‖y i ‖ 2 can be quantized by 256 discrete scale values, where a scale value is selected to approximate ‖y i ‖ 2 and its indices are used to denote it. In this case, the proposed uniform quantization for ‖y i ‖ 2 can be displayed in the following: where [·] transforms ‖y i ‖ 2 into an integer value ranging from 0 to 255. is is performed when the database vectors are quantized offline and stored in a look-up table. In the experiments later, we will show the influence of length bits' choices on the ANN search accuracy.

Hypersphere-Based Filtration for Exhaustive ANN Search
Given a query vector q, the distance between q and the vectors in database will be computed according to formula (8) when performing exhaustive ANN search. en, a distance sorting method is adopted over all the vectors to return close vectors of presetting number.
To reduce the number of vectors in distance sorting, a hypersphere can be constructed for each query vector q in vector space. An example in 2D space is showed in Figure 3. Only the vectors lying in the hypersphere are taken into distance sorting. e others are filtered by the hypersphere.
us, the problem that remains is how to determine the radius for each hypersphere.
In accumulative quantization, each codebook partitions the dataset into k clusters with each centroid as the center.
en, the first L ′ (1 ≤ L ′ ≤ L) codebooks can be considered to partition the dataset into k L ′ clusters. Each center is the sum vector of L ′ centroids, where each centroid is selected from a codebook respectively. e vectors in a cluster are usually considered to be similar to the center vector, but this similarity between vectors may not be transitive. In Figure 3, although center g does not lie in the sphere, there still are dots lying in the sphere.
Here, based on the first L ′ codebooks of AQ, k L ′ cluster centers can be produced. en, for a query vector q, w nearest cluster centers can be obtained based on the distances computed according to formula (8). Finally, the hypersphere can be constructed, where the corresponding radius is computed as follows: where c q,i belongs to the set containing the w nearest cluster centers of q. d(q, c q,i ) is computed according to formula (8).
Only the vectors whose distances to query vector q are smaller than radius q are put into sorting when performing exhaustive ANN search. e granularity of partitioning dataset is finer if L ′ is larger. Under fixed L ′ , the radius of hypersphere is larger with increasing w, which results in less vectors to be filtered by hypersphere.

Experiments
All the experiments are measured on a machine with Xeon 16 cores 2.4GHZ CPU and 16 GB RAM, except for the experiments on 1B SIFT with 256G RAM. Computational Intelligence and Neuroscience

Datasets.
ree publicly available datasets [4], SIFT descriptor dataset and GIST descriptor dataset, are used to evaluate the performance. SIFT descriptor codes small image patch while gist descriptor codes the entire image. SIFT descriptor is a histogram of oriented gradients extracted from gray image patch. GIST descriptor is similar to SIFT but applied to the entire image. It applies an oriented Gabor filter over different scales and averages the filter energy in each bin.
SIFT and GIST datasets have three subsets: learning set, database set, and query set. e learning set is used to train stage-codebooks, and the database and query sets are used for evaluating quantization performance and ANN search performance. For SIFT dataset, the learning set is extracted from Flicker images [28] and the database and query vectors are extracted from INRIA holidays images [29]. For GIST dataset, the learning set consists of the tiny image set of [30]. e database set is holidays image set combined with Flicker 1M [28]. e query vectors are extracted from the holidays image queries [29]. All the descriptors are high-dimensional float vectors. e details of datasets are given in Table 1.

Convergence of Training Codebook.
In training codebook for accumulative quantization, the optimization aims to gain more accurate codebook, so that the vectors can be approximated more precisely when quantizing them. To implement this easily, instead of using a preset threshold, we set the total number of iteration (the total number is 20) as the convergence condition when optimizing codebooks for accumulative quantization. To evaluate the convergence of training codebook, this section shows the training error during codebook training on the 1M SIFT and 1M GIST, including initial codebook training and codebook optimization.
When decomposing each input vector v ∈ R d into L partial vectors, v is firstly divided into L subvector of (d/L) dimension; then, each subvector is extended to L dimension by filling the other components with 0. e parameter L representing the number of codebooks ranges within {4, 8, 12, 16}. e number of centroids in each codebook is set as the typical value k � 256.
In Figure 4, the iteration number 0 denotes the codebook training without codebooks optimization. As seen in Figure 4, the codebook optimization can obviously reduce the errors produced by initial codebooks training. Besides, the proposed codebook optimization can converge rapidly, which can be observed from the notion that the curves tend to be flat in less than 5 iterations on 1M SIFT dataset and 10 iterations on 1M GIST dataset. en, the conclusion can be drawn that the codebook optimization can improve the approximation power of codebooks effectively. Figure 2 shows the proposed vector quantization (vector encoding) mechanism converges rapidly.

Quantization Performance.
is section investigates the quantization performance of our approach through evaluating the overall quantization error measured by MSE between vectors and their reconstructed ones under different parameters: k and L. e code length nbits � Llog k 2 denotes the memory requirement to store a vector after quantizing it. K ranges within {16, 64, 256}, and L ranges within {4, 8, 12, 16}. Figure 5 shows the trade-offs between overall quantization error and memory usage for a vector on 1M SIFT and 1M GIST. Generally, larger number of bits brings lower overall quantization error. en, a vector is quantized more accurately. Besides, it can be observed from Figure 5 that the overall quantization error is reduced by increasing either parameter k or parameter L. Given a fixed number of bits, the proposed accumulative quantization with more centroids contained in each codebook and fewer number of quantizers can gain more accurate quantization output than that of fewer centroids contained in each codebook and larger number of quantizers. While promising, the former choice (larger k and smaller L) usually takes more time costs than the latter (smaller k and larger L) to quantize vector.

e Influence of Parameters on ANN Search Performance.
To estimate the accuracy of the notion that vectors are approximated by their quantization outputs, exhaustive ANN search is implemented. Recall@R is used to measure the ANN search accuracy. Recall@R is defined as the proportion of query vectors for which the nearest neighbor is ranked in the first R position. e larger the recall@R is, the better the search accuracy is.   16}. It can be seen that the average ANN search takes more time with increasing k or L. Under the same L, the larger the value of k, the more the computation needed in constructing each look-up table. Under the same k, the larger the value of L, the more the look-up tables needed to be constructed. Figure 7 shows the exhaustive ANN search results by AQ on 1M SIFT and 1M GIST, using recall@100 to measure the search accuracy. Given k and L, the code length used to encode the vector is Llog k 2 bits. It can be seen that the search accuracy is improved with increasing k or L. Under the same Computational Intelligence and Neuroscience code length, the search accuracy with large k and small L is better than that with small k and large L. On 1M SIFT, the recall@100 becomes 1 when code length is 96 bits. On the trade-off between search accuracy and search time efficiency, Figures 6 and 7 show that it can be obtained if k � 256 and L � 8. Also, these settings of parameter values are the typical value provided in state-of-the-art reference. en, we use typic k � 256 and L � 8 in followup experiments.

e Influence of Length
Bits. An asymmetric distance computing method is designed in formula (8) to accelerate distance computing between query vector and vectors in database when doing AQ-based ANN search. A uniform scale quantization is designed to quantize the third term ‖ :L l�1 y i,l ‖ 2 by using several binary bits, so that the requirement of storing the length is reduced when computing ‖ :L l�1 y i,l ‖ 2 offline. is section investigates the influence on ANN search under different number of length bits with k � 256 and L � 8. Conveniently, AQ denoted accumulative quantization-based ANN search by storing length of the third term, while AQ-n denotes using n bits to quantize the third term in formula (8). Tables 2 and 3 show the exhaustive ANN search accuracy with AQ and AQ-n on 1M SIFT and 1M GIST datasets. e number of length bits determines the discrimination of the third term in formula (8). It is reflected by the fact that the search accuracy of AQ-n becomes more and more similar to AQ when increasing n. Due to larger dimensionality of GIST vector than SIFT vector, AQ-n needs more length bits to achieve the same search accuracy as AQ. It can be observed from Table 2 that AQ-8 and AQ own the same search accuracy on 1M SIFT dataset, while n need to be increased to 10 so that AQ-10 is comparable with AQ on 1M GIST.

5.4.3.
e Influence of Hypersphere Parameters. e hypersphere is constructed for each query to reduce the number of vectors putting into sorting, so that the ANN search time efficiency can be improved under the condition of no loss on search accuracy. is section evaluates the influence of L ′ and w on the search performance. Parameter L ′ denotes using centroids in the first L ′ codebooks to build up k L ′ (k and L are set as typical values 256 and 8, resp.) centers. Parameter w denotes getting w nearest centers from k L ′ centers. Figure 8 shows the search performance when applying hypersphere filtration in AQ-based exhaustive ANN search. By constructing a hypersphere for query vector, nonsimilar vectors can really be filtered so that only a part of vectors are taken into distance sorting, which can be observed from Figure 8. Compared to 1M GIST dataset, using hypersphere can filter more nonsimilar vectors on 1M SIFT dataset. Commonly, when w � 1, it gains the best effect on filtering nonsimilar vectors whether L' � 1 or 2 on both 1M SIFT and 1M GIST. Besides, it can be seen that the number of filtered vectors when L' � 2 is more than that when L' � 1. With increasing value L', the centers composed by the first L' codebook are more and more close to query vectors. en, the constructed hypersphere by the w nearest centers becomes smaller, so that the number of vectors lying in the hypersphere will be reduced. Consequently, the number of filtered vectors increases. e ANN search performance regarding recall@100 and search time cost per query vector is detailed in Table 4. By filtering vectors out of constructed hypersphere, the number of vectors taken into distance sorting is reduced, so that the search time cost can be decreased correspondingly. Moreover, the ANN search accuracy is not weakened compared to AQ without hypersphere filtration. It can be demonstrated that hypersphere filtration-based ANN search can improve search time efficiency without weakening the search accuracy.  Computational Intelligence and Neuroscience

Comparison with the State of the Art.
We compare our approach with five state-of-the-art exhaustive ANN search methods: RVQ-based exhaustive search [18], ERVQ-based exhaustive search [28], PQ-based exhaustive search [4], CQbased exhaustive search [25], and quarter product quantization-based exhaustive search [21], which are, respectively, indicated as RVQ, ERVQ, PQ, CQ, and QPQ. Correspondingly, our AQ-based exhaustive search method is indicated as AQ. ose five methods typically set k � 256 and L � 8 in experiments, which is detailed, respectively, in references [4,18,21,25,28]; thus, we also use the same parameter settings in this experiment for consistency. Figure 9 shows the comparison of exhaustive ANN search between our approach and those five ANN search methods on 1M SIFT and 1M GIST datasets, respectively. Recall@R is used to measure the ANN search accuracy, where R ranges within {1, 5, 10, 20, 50, 100}. For RVQ, ERVQ, and CQ, we use the typic value of parameters given in references, where L � 8 and stage centroids k � 256. Also, for PQ and QPQ, we use typic 64 bits to quantize the vectors, where each vector is divided into 8 subvectors and each subvector is quantized with codebook containing 256 centroids.
From Figure 9(a), it can be seen that AQ outperforms RVQ, ERVQ, PQ, and CQ under the same scale of codebooks, while AQ owns comparable ANN search accuracy with QPQ. However, QPQ uses 2 nearest centroids to approximate each subvector during quantization procedure. en, each subvector needs to spend twice the number of bits to represent it compared to AQ. Consequently, under the same scale of codebooks, QPQ needs twice the memory compared to AQ to store the codes when quantizing vectors. erefore, it can be observed that AQ can consume less memory than QPQ under the condition of obtaining the same recall@R.
On 1M GIST dataset, due to the structured characteristics of GIST vectors, there is a structured version of PQ by regrouping GIST vectors, named as S-PQ, while natural PQ denotes PQ without regrouping GIST vectors. e ANN search accuracy of QPQ and natural PQ decreases more significantly than the other methods. Figure 9(b) shows the ANN search accuracy of AQ is superior to that of RVQ, ERVQ, S-PQ, QPQ, and natural PQ. Comparing the curves between AQ and CQ in Figure 9(b), AQ is inferior to CQ when R < 20, while AQ outperforms CQ when R > 20.
Tables 5 and 6 detail the search accuracy and time efficiency of exhaustive ANN search by the above 6 methods, where the efficiency is measured by runtime tested on our machine. Due to the lower dimensionality of vector in SIFT       dataset, those methods performed on SIFT dataset can obtain better ANN search time efficiency than that on GIST dataset. e search time of PQ and QPQ is slightly less than that of RVQ, ERVQ, CQ, and AQ. e reason lies in the fact that PQ and QPQ use lower dimensional subvector to construct the look-up tables while the others use the whole vector. In ERVQ, the final number of centroids in each stagecodebook may be smaller than preset value k, so the ANN search time efficiency by ERVQ is slightly superior to RVQ, CQ, and AQ, while these 3 methods own almost the same ANN search time cost per query.
When AQ is combined with hypersphere filtration mechanism, the ANN search time efficiency is improved due to the reducing number of vectors put into sorting. Moreover, it can be observed from Tables 5 and 6 that the ANN search time efficiency of AQ with filtration is superior to that of other methods. Table 7 shows the exhaustive ANN search performance comparison on 1B SIFT dataset. Similar to [25], the first 1M learning vectors are used for efficient codebooks training. It can be seen that AQ obtains the best recall@100 among those 6 methods. It means that the improvement on ANN search is consistent.
For exhaustive ANN search, it needs to compute the approximate distances from query vector to all the vectors in the database and then do sorting. erefore, compared to Table 5, under the same condition, the performance on 1B SIFTdataset is worse than that on 1M SIFTdataset, especially taking much more search time, which is consistent for the other methods. It is reasonable as searching in larger number of vectors is more difficult.

Conclusions
In this paper, we present an accumulative quantization for approximate nearest neighbor search. It exploits the accumulation of centroids from several codebooks to approximate a vector. For this purpose, a codebook optimization is designed to improve the approximation ability of codebooks by minimizing the overall quantization errors. When encoding vectors offline, the quantization outputs are optimized iteratively to reduce quantization error. us, the proposed accumulative quantization can achieve superior approximate nearest neighbor search accuracy to the state of the art. A uniform scale quantization is designed to reduce the requirement of storing the norm of ‖y i ‖ 2 . Empirical results show that the search accuracy can be guaranteed with a small number of bits. A hypersphere-based filtration is proposed to reduce the number of vectors putting into sort on the condition of no influence on search accuracy. Experiments show that the search time efficiency can be improved towards natural search.   In the further work, we will investigate the efficient nonexhaustive ANN search with AQ.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest regarding the publication of this paper.