We propose a two-part local image descriptor EL (Edges and Lines), based on the strongest image responses to the first- and second-order partial derivatives of the two-dimensional Gaussian function. Using the steering theorems, the proposed method finds the filter orientations giving the strongest image responses. The orientations are quantized, and the magnitudes of the image responses are histogrammed. Iterative adaptive thresholding of histogram values is then applied to normalize the histogram, thereby making the descriptor robust to nonlinear illumination changes. The two-part descriptor is empirically evaluated on the HPatches benchmark for three different tasks, namely, patch verification, image matching, and patch retrieval. The proposed EL descriptor outperforms the traditional descriptors such as SIFT and RootSIFT on all three evaluation tasks and the deep-learning-based descriptors DeepCompare, DeepDesc, and TFeat on the tasks of image matching and patch retrieval.
Javna Agencija za Raziskovalno Dejavnost RSP2-02141. Introduction
Local image descriptors represent an important area of research in computer vision. Reliable local feature matching is required in numerous applications, for example, in emerging mobile visual search (MVS) [1], panorama stitching [2], image mosaicing [3], texture classification [4], partial-duplicate web image retrieval [5], wide-base stereo [6], and object recognition [7, 8]. Computer vision researchers have proposed many types of descriptors. We can divide them into handcrafted descriptors (SIFT [9], GLOH [10], SURF [11], BRIEF [12], KAZE [13], AG [14], Max-SIFT [15], and FBRK [16]) and those based on learning (BestDaisy [17], DeepCompare [18], DeepDesc [19], TFeat [20]).
Various benchmarks and different measures and protocols are available for local image descriptor evaluation [10, 21–25]. Recently, Balntas et al. [26] introduced HPatches, a new public benchmark for evaluation of local image descriptors. It includes a large body of patches obtained from image sequences of different scenes, captured under different lighting conditions and with large changes in viewpoint. The benchmark offers an open-source implementation of protocols for evaluating local image descriptors on three different tasks: patch verification, image matching, and patch retrieval. In the same paper, the authors also show that a simple normalization of the handcrafted RootSIFT descriptor [27] can boost its performance to the level of deep-learning-based descriptors. The RootSIFT descriptor achieved the best result for the task of image matching and the second best for the task of patch retrieval. These results encouraged us to study the use of first- and second-order partial derivatives of the two-dimensional Gaussian function in defining a local image descriptor.
Local image descriptors based on higher-order image differentials, for example, the local jet [28], differential invariants [29], and steerable filters [30] have been studied before as low-dimensional point descriptors and gave poor results on tests [10]. Using higher-order image differentials in a different way, as histogrammed feature elements, proved to be much more successful. In [31], the authors propose a very simple algorithm based on the responses of a second-order bank of six Gaussian derivatives, which classifies an image location into one of seven Basic Image Features (BIFs): near-flat location, slope-like points, blob-like points (dark and light), line (dark and light), and saddle-like points. Jaccard et al. [32] applied BIFs for phase-contrast microscopy image segmentation. In [33], the authors extend BIFs to oBIFs, by adding local image orientation to slope, line, and saddle-like points. Slope-like points are assigned a gradient orientation and line and saddle-like points an orientation in the direction perpendicular to the largest eigenvalue of the Hessian. Experiments demonstrate that a larger feature alphabet can lead to better performance and simpler encoding of visual words. In [17], the authors break up the descriptor extraction process into a number of modules and put these together in different combinations. The best descriptors are those with log-polar pooling regions and feature vectors constructed from rectified outputs of steerable quadrature filters. Unfortunately, these descriptors are of large dimensions; therefore, in [34], the authors add modules for dimension and dynamic range reduction.
In our approach, we use the first- and second-order partial derivatives of the two-dimensional Gaussian function, i.e., the edge and line detection filters. The proposed method forms a descriptor by histogramming and pooling magnitudes of the strongest responses to the steerable filters and uses an iterative adaptive thresholding to normalize the histogram values. The proposed descriptor achieves high mAP scores on the tasks of image matching and patch retrieval, and as such, it represents an attractive alternative to popular local descriptors.
The proposed approach is described in detail in the following sections. In the next section, we first describe how the optimal filter orientations are found and the corresponding magnitudes are computed. In the following section, we describe how the descriptor is formed. This is followed by the presentation of the experimental results and the final concluding section.
The proposed descriptor is publicly available at https://github.com/REVAMJ/ELdescriptor.
2. Extreme Responses to Edge and Line Filters
The proposed EL image descriptor is based on the first- and second-order partial derivative of the two-dimensional Gaussian function. The two partials have a nice property that they are orthogonal to each other.
2.1. Theory of Steerable Filters
Our algorithm is rooted in the theory of steerable filters, described in [30]. Let(1)gx,y=12πσ2exp −x22σ2+y22σ2be the two-dimensional Gaussian function and(2)gx0∘=∂∂xgx,y=−xσ2gx,ybe its first partial derivative in the x direction (Figure 1). The same function rotated by 90∘ counterclockwise is(3)gx90∘=∂∂ygx,y=−yσ2gx,y.
Basis filters Gx0∘ and Gx90∘ and Gxx0∘, Gxx60∘, and Gxx120∘.
Let …θ represent the rotation operator such that for any function fx,y, fθx,y is fx,y rotated through an angle θ about the origin. According to the theory of steerable filters, gxθ can be synthesized by taking a linear combination of two basis filters gx0∘ and gx90∘:(4)gxθ=cosθgx0∘+sinθgx90∘.
Let Gx0∘=gx0∘∗I and Gx90∘=gx90∘∗I where∗is the convolution operation and I is an intensity image. An image response to gxθ can simply be computed as(5)Gxθ=cosθGx0∘+sinθGx90∘.
The second partial derivative of gx,y is equal to(6)gxx0∘=∂2∂x2gx,y=−1σ2+x2σ4gx,y.
Computation of filter (6) in an arbitrary orientation requires two additional basis filters. We have chosen gxx60∘ and gxx120∘ (Figure 1). Let Gxx0∘=gxx0∘∗I, Gxx60∘=gxx60∘∗I, and Gxx120∘=gxx120∘∗I, then according to [30], an image response to gxxθ is computed as(7)Gxxθ=k1Gxx0∘+k2Gxx60∘+k3Gxx120∘,with interpolation functions kj,j=1,2,3, equal to kj=1/31+2 cos2θ−θj and with θ1, θ2, and θ3 being equal to 0∘, 60∘, and 120∘, respectively.
2.2. Filter Best Orientation
At some image location, the filter best orientation is the one that gives the strongest image response. The proposed descriptor will be composed of such responses. For the first-order partial, we compute ∂Gxθ/∂θ=0 and obtain two filter orientations, one minimum and the other maximum:(8)θEmin=atan2−Gx90∘,−Gx0∘,θEmax=atan2Gx90∘,Gx0∘.
Three such examples are shown in Figure 2. The minimal and maximal image responses are equal in magnitude (see Figure 2(b)); therefore, it is sufficient to record just one of them; let θE=θEmax and GE=GxθE.
Extreme image responses. Maximal image responses are denoted by red dots and minimal by yellow dots. (a) The green cross marks the sample location at which image responses are computed. (b) Image response Gxθ. The maximal and minimal image responses are equal in magnitude. (c) The blue curve represents Gxxθ. The maximal and minimal image responses are different in magnitude and can be of opposite signs, both negative, or both positive. The cyan curve represents −Gxxθ. The image response −GxxθLmin is denoted by a red dot. (d) Color marking of the patches in column (a); cyan color indicates the sample locations where −GxxθLmin>GxxθLmax while blue represents the remaining locations.
In the case of the second-order partial (7), computation of ∂Gxxθ/∂θ=0 gives two extrema for the function basic period. The minimum is(9)θLmin=12atan23Gxx120∘−Gxx60∘,Gxx120∘−Gxx60∘−2Gxx0∘,and the maximum is(10)θLmax=12atan2−3Gxx120∘−Gxx60∘,−Gxx120∘−Gxx60∘−2Gxx0∘.
In Appendix, we explain how they are computed. The minimal and maximal image responses are different in magnitude (see Figure 2(c)). Both can be positive or negative, or of opposite signs. If our goal is to record both values, GxxθLmax and GxxθLmin, then we have to distinguish between positive GxxθLmax, negative GxxθLmax, positive GxxθLmin, and negative GxxθLmin. Each of the four options requires its own representation (for example, in the case of orientation binning, a histogram), which results in a long descriptor. However, we can choose to discard a piece of information. At the minimum θLmin, we compute, rather, the image response to the negative basis filters −gxx0∘, −gxx60∘, and −gxx120∘, which gives −GxxθLmin. We can interpret gxxθLmax as a dark-line detection filter while −gxxθLmin is a light-line detection filter. Figure 2(c) shows image responses obtained by the positive and negative basis filters. The procedure is as follows. From equations (9) and (10), we compute θLmin and θLmax and then from equation (7) the image responses −GxxθLmin and GxxθLmax. The two image responses are compared (see Figure 2(d)) and only the larger is considered further, together with the corresponding filter orientation:(11)GL=max −GxxθLmin,GxxθLmax,θL=θLmin,if−GxxθLmin>GxxθLmax,θLmax,otherwise.
Notice that GL is always positive.
3. Descriptor Representation
The proposed descriptor is composed of two parts. The first part uses θE and GE, and the second part uses θL and GL. For simplicity, we name the new descriptor EL (Edges and Lines), because the filters used actually detect edges and lines. Descriptor construction requires the following three steps: orientation binning, orientation pooling, and histogram normalization.
3.1. Orientation Binning
The orientation binning for θE,GE is performed in the same way as in SIFT. We quantize orientation into eight histogram bins corresponding to angles θ=−180∘, −135∘, −90∘, −45∘, 0∘, 45∘, 90∘, and 135∘. At each sample location, we construct a histogram vector of length eight. A filter response θE,GE contributes to the two orientation bins adjacent to θE, whereby the value GE is distributed linearly between the two bins. Thus, if θE lies between the two bins with angles θi and θj, then bin θi receives a contribution GEθj−θE/θj−θi. Here, θj−θE means the angular distance between the two angles (with wrap-around).
The orientation binning of θL is slightly different from the orientation binning of θE. Due to the symmetry of the filter gxx0∘=gxx180∘, we quantize orientation only into four orientation bins corresponding to angles θ=−90∘, −45∘, 0∘, and 45∘. At each sample location, we construct two histogram vectors of length four. One vector is intended for θL equal to θLmin and one for θL equal to θLmax. Values of the vector components are determined by distributing GL to the two bins adjacent to θL in the same way as describe above for GE.
3.2. Orientation Pooling
The vectors from the previous stage are summed together spatially weighted with Gaussian weights according to their distance from the pooling centers. As recommended by Winder and Brown [17], we use 17 pooling centers and three different Gaussian weighting functions, as illustrated in Figure 3.
Polar arrangement of seventeen Gaussian summation regions. Circles indicate 1 standard deviation. For a 65×65 patch size, the standard deviations are σ0=3, σ1=5.5, and σ2=9.75. The centers of the Gaussian summation regions lie on the rings with radii r0=0, r1=14.5, and r2=31.5. All measures are in pixels.
After orientation summation, each pooling center is represented by one vector of length eight, representing image responses to the first-order partial derivative, and two vectors of length four, representing image responses to the second-order partial derivative. Vectors from all pooling centers are then concatenated into a common vector, i.e., a local image descriptor D=d1,d2,…,dn with n=272.
3.3. Descriptor Normalization
The descriptor is normalized to reduce the effects of linear and nonlinear illumination changes. The proposed algorithm used by EL starts with an adaptive iterative thresholding of descriptor components. It repeats the following three steps ten times:
The average value of descriptor components is calculated, d¯=∑i=1ndi/n
The threshold is calculated, T=tC⋅d¯
Each descriptor component is thresholded to be no larger than T
The constant tC=2.6 in step 2 is determined experimentally. Then, we follow the approach used by RootSIFT [27], which uses a square root (Hellinger) kernel instead of the standard Euclidean distance to measure the similarity between SIFT descriptors.
The descriptor is normalized to have unit L1 norm
Each descriptor component is represented by its square root
The graph in Figure 4 demonstrates the effect of the proposed iterative adaptive thresholding. The truncated descriptor components of the corresponding patches are significantly more similar than those in the case of normalization used by RootSIFT, indicated by the circles on the ordinate, or by the single-step adaptive thresholding, indicated by the results in iteration no. 1.
The final values of the truncated descriptor components (after step 5) for different numbers of adaptive thresholding iterations (steps 1 to 3) computed for six corresponding patches. The color circles on the ordinate represent the final values of the truncated descriptor components obtained when using RootSIFT normalization.
4. Results and Discussion
The proposed approach is evaluated on the HPatches dataset (https://github.com/hpatches/hpatches-dataset) described in [26], which provides more than 2.5 million preextracted patches of size 65×65 pixels from 116 image sequences captured from different scenes. Changes in the images are due to changing scene lighting conditions and varying camera viewpoints (Figure 5). For each image sequence, patches are detected in the reference image and projected on the target images using the ground truth homographies. Detections are perturbed by increasing amounts of geometric noise, resulting in three patch sets of increased difficulty: easy, hard, and tough.
Examples of image pairs from full-image sequences in the HPatches dataset [26]: i_fenis, i_dome, i_leuven, v_there, v_graffiti, and v_sunseason.
The authors also define the evaluation protocol and present its open-source implementation for fair comparison of local image descriptors on three different tasks: patch verification, image matching, and patch retrieval. In this work, we strictly follow the proposed protocol and use the provided implementation for evaluation of our approach and comparison with the related work.
The proposed descriptor is computed for the Gaussian function gx,y with σ=2.4 pixels (equation (1)), threshold constant tC=2.6 (see Section 3.3), and descriptor footprint as shown in Figure 3.
4.1. Evaluation of the Proposed Descriptor
First, we evaluate the proposed descriptor denoted by EL and its individual parts denoted by E and L. Figure 6 shows the results in terms of the mean Average Precision (mAP(%)) for three different tasks (patch verification, image matching, and patch retrieval) on three different sets (easy, hard, and tough). Results are also shown for the postprocessed variants of descriptors +EL, +E, and +L obtained by applying ZCA whitening with clipped eigenvalues, followed by power law normalization and L2 normalization [26].
Results reported in mAP(%) on three various tasks: patch verification, image matching, and patch retrieval obtained by different variants of EL, E, and L descriptors. Suffixes “s” and “r” in the descriptor names refer to descriptor normalization used by SIFT and RootSIFT. Dashed bar borders and + indicate ZCA projected and normalized features.
We can observe that the combined EL descriptor produces higher scores than both single-component descriptors E and L. Descriptor L gives lower mAP scores than E. We see at least two reasons for this. In the case of a blob, the filter gxxθ, used by L, gives an equal response for all filter orientations θ. This means that the filter best orientation in not well defined. In the case of a saddle, the minimal and maximal responses of gxxθ have the same magnitude. Due to different image deformations, the algorithm might choose in one situation the minimum while in the other the maximum. These situations increase errors in descriptor components. The differences in scores between the two-part descriptor +EL and one-part descriptor +E for the task of patch verification, image matching, and patch retrieval are 2.15, 3.59, and 4.52 percent point (pp), respectively.
We also wanted to verify the contribution of the iterative adaptive thresholding in the normalization step. We compared the proposed approach with the fixed thresholding, as used by the SIFT and RootSIFT descriptors. We therefore replaced our normalization approach, described by steps 1–5 in Descriptor Normalization, with normalization used by SIFT and RootSIFT (using the fixed threshold T=0.12). We denote the obtained variants of the descriptor as EL-s and EL-r, respectively. We can notice that for the task of image matching by applying the Hellinger kernel, used by EL-r, we can improve mAP for 8.02 pp and by using the proposed iterative adaptive thresholding, used by EL, for additional 1.51 pp. For the postprocessed versions, the improvements are 5.51 and 1.88 pp. The iterative adaptive thresholding improves scores also for the task of patch retrieval. We can conclude that iterative adaptive thresholding used by EL is beneficial for the task of image matching and patch retrieval.
4.2. Comparison to Related Work
We compare the performance of the proposed descriptor with the previously published results on HPatches [26]. Table 1 shows scores obtained with two established handcrafted descriptors SIFT [9] and RootSIFT [27], and the deep-learning-based descriptors DC-S and DC-S2S [18], DDESC [19], and TF-M and TF-R [20].
Verification, matching, and retrieval results obtained for the EL, SIFT, and RootSIFT descriptor, and the deep-learning-based descriptors DC-S and DC-S2S [18], DDESC [19], and TF-M and TF-R [20]. All results are reported in mAP(%). The scores are given in pairs. The first value represents the score obtained with the basic method and the second with its ZCA variant [26]. Best result in a column is given in bold.
Descriptor
Verification
Matching
Retrieval
Basic, ZCA
Basic, ZCA
Basic, ZCA
SIFT
65.12, 74.35
25.47, 32.76
31.98, 40.36
RootSIFT
58.53, 76.70
27.22, 36.77
33.65, 43.84
DC-S
70.04, 81.63
24.92, 31.65
34.84, 39.68
DC-S2S
78.23, 83.03
27.69, 32.34
34.76, 38.23
DDESC
79.51, 81.65
28.05, 35.44
39.83, 44.55
TF-M
81.90, 82.69
30.61, 34.29
39.40, 40.02
TF-R
81.92, 83.24
32.64, 34.37
37.69, 40.23
EL
73.36, 79.99
36.92, 43.00
40.68, 48.12
For all three tasks, EL and its postprocessed variant improve the performance compared to SIFT and RootSIFT. For the tasks of image matching and patch retrieval, EL also achieves better scores than deep-learning-based descriptors. The EL descriptor is particularly suitable for the task of image matching. Notice that the score 36.92% achieved by EL is higher even than the scores obtained by postprocessed variants of all other descriptors. Its postprocessed variant +EL achieves even better score, defeating all other descriptors for more than 6 pp. Similar to SIFT and RootSIFT, the EL descriptor is less appropriate for the task of patch verification.
We can expect that an approach based on pooling across different scales, in addition to spatial locations, i.e., the approach proposed by [34, 35], would improve the obtained scores even further. However, here we limit our evaluation to one scale only.
Table 2 shows the dimensionality, size of the measurement region in pixels, and extraction time of each descriptor. Note that EL, E, and L are implemented in Matlab; therefore, their time efficiency should be interpreted with caution; a more efficient implementation and code optimization could further speed up the descriptor extraction process.
Basic properties of the evaluated descriptors. The speed is measured in thousands of descriptors extracted per second.
Descr.
SIFT
RootSIFT
DC-S
DC-S2S
DDESC
TF-M
TF-R
EL
E
L
Dims
128
128
256
512
128
128
128
272
136
136
Patch Sz
65
65
64
64
64
32
32
65
65
65
CPU speed
2
2
0.3
0.2
0.1
0.6
0.6
0.4
1
0.6
5. Conclusions
We propose a two-part descriptor, named EL (Edges and Lines), based on the maximal image responses to the first- and second-order partial derivatives of the two-dimensional Gaussian function. The maximal image responses are calculated by using the steering theorems [30]. In this way, we complement the understanding of oBIFs [33]. The two parts of the proposed descriptor, E and L, are of equal size; each one contains 136 values.
To increase the descriptor robustness to nonlinear illumination changes and to increase the impact of less contrasting regions, the fixed thresholding of descriptor components, as used by SIFT and RootSIFT, is replaced by an iterative adaptive thresholding.
The proposed descriptor was evaluated on HPatches benchmark for three different tasks, namely, patch verification, image matching, and patch retrieval. The postprocessed variant of the EL descriptor obtained by applying ZCA whitening with clipped eigenvalues, followed by power law normalization and L2 normalization, outperforms the postprocessed variant of the RootSIFT descriptor for all three tested tasks and the tested deep-learning-based descriptors for the tasks of image matching and patch retrieval.
One of our goals was also to explore the contribution of the second-order partial derivatives of the two-dimensional Gaussian function. Experimental results show improvements in scores for all three tasks. The largest improvement was obtained for the task of patch retrieval. Here, mAP is improved for 4.52 percent point. This is an important improvement, and we therefore recommend including both parts in a local image descriptor.
Overall, the results are very favorable especially for the tasks of image matching and patch retrieval, which are commonly required in many computer vision applications. What is also worth noting is that the proposed approach is very clear and is based on solid mathematical foundations of the theory of steerable filters. The EL descriptor is publicly available at https://github.com/REVAMJ/ELdescriptor.
Appendix
Equation (7) uses three interpolation functions kj=1/31+2 cos2θ−θj,j=1,2,3 with θ1, θ2, and θ3 being equal to 0∘, 60∘, and 120∘, respectively. The three interpolation functions can be expressed in the following forms:(A.1)k1=131+2 cos2θ,(A.2)k2=131+2 cos2θ−60∘=131+2 cos2θcos120∘+sin2θsin120∘=131−cos2θ+3sin2θ,(A.3)k3=131+2 cos2θ−120∘=131+2 cos2θcos240∘+sin2θsin240∘=131−cos2θ−3sin2θ.
Using (A.1), (A.2), and (A.3) in equation (7), we obtain(A.4)Gxxθ=131+2 cos2θGxx0∘+131−cos2θ+3sin2θGxx60∘+131−cos2θ−3sin2θGxx120∘.
A filter orientation, which gives the strongest image response, is found by computing the first derivative of (A.4):(A.5)Gxxθ′=23−2 sin2θGxx0∘+23sin2θ+3cos2θGxx60∘+23sin2θ−3cos2θGxx120∘=23sin2θ−2Gxx0∘+Gxx60∘+Gxx120∘−233 cos2θGxx120∘−Gxx60∘=23A sin2θ−23B cos2θ,with A and B being equal to(A.6)A=−2Gxx0∘+Gxx60∘+Gxx120∘,B=3Gxx120∘−3Gxx60∘.
To find the extrema, we solve Gxxθ′=0, which gives us two solutions for the function basic period:(A.7)tan2θ1=BA,(A.8)2θ1=atan2B,A,(A.9)tan2θ2=−B−A,(A.10)2θ2=atan2−B,−A.
To determine whether the extremum is a minimum or maximum, we examine the second derivative. For the minimum, the valid condition is Gxxθ″>0, while for the maximum, Gxxθ″<0. Let us compute Gxxθ″:(A.11)Gxxθ″=23sin2θA−23cos2θB′=43cos2θA+43sin2θB.
For the minimum, we obtain the condition 4 cos2θA+4 sin2θB>0 or equivalently(A.12)tan2θ>−AB;B>0 and cos2θ>0,orB<0 and cos2θ<0,(A.13)tan2θ<−AB;B>0 and cos2θ<0,orB<0 and cos2θ>0,and for the maximum(A.14)tan2θ<−AB;B>0 and cos2θ>0,orB<0 and cos2θ<0,(A.15)tan2θ>−AB;B>0 and cos2θ<0,orB<0 and cos2θ>0,
Before we continue, we note that cos2θ1, with 2θ1 given by equation (A.8), is positive when A is positive and cos2θ1 is negative when A is negative. On the contrary, cos2θ2, with 2θ2 given by equation (A.10), is positive when A is negative and cos2θ2 is negative when A is positive. Solution (A.8) satisfies equations (A.12), and (A.13); therefore, θ1 represents a minimum while the solution (A.10) satisfies equations (A.14) and (A.15); therefore, θ2 is a maximum.
Data Availability
The HPatches dataset used to support the findings of this study is publically available at https://github.com/hpatches/hpatches-dataset and described in the paper: HPatches: A benchmark and evaluation of handcrafted and learned local descriptors (DOI:10.1109/CVPR.2017.410). The code we wrote is also publically available at https://github.com/REVAMJ/ELdescriptor.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
Acknowledgments
This work was supported by the Slovenian Research Agency (grant number P2-0214).
BiancoS.MazziniD.PauD. P.SchettiniR.Local detectors and compact descriptors for visual search: a quantitative comparisonBrawnM.LoweD. G.Automatic panoramic image stitching using invariant featuresGhoshD.KaabouchN.A survey on image mosaicing techniquesYuX.ZhangY.WangH.A novel local human visual perceptual texture description with key feature selection for texture classificationZhouW.LuY.LiH.SongY.TianQ.Spatial coding for large scale partial duplicate web image searchProceedings of the 18th ACM International Conference on MultimediaOctober 2010New York, NY, USA511520DouY.HaoK.DingY.MaoM.A mean-shift-based feature descriptor for wide baseline stereo matchingMeleK.ŠucD.MaverJ.Local probabilistic descriptors for image categorisationXieL.WangJ.LinW.ZhangB.TianQ.Towards reversal-invariant image representationLoweD. G.Distinctive image features from scale-invariant keypointsMikolajczykK.SchmidC.A performance evaluation of local descriptorsBayH.EssA.TuytelaarsT.van GoolL.Speeded-up robust features (SURF)CalonderM.LepetitV.OzuysalM.TrzcinskiT.StrechaC.FuaP.BRIEF: computing a local binary descriptor very fastAlcantarillaP.BartoliA. F.DavisonA. J.KAZE featuresProceedings of the 12th ECCVOctober 2012Florence, Italy214227MandeljcR.MaverJ.AGs: local descriptors derived from the dependent effects modelXieL.TianQ.ZhangB.Max-sift: flipping invariant descriptors for web logo searchProceedings of the 18th ACM International Conference on MultimediaJune 2014Mountain View, CA, USA57165720YangL.LuZ.A new scheme for keypoint detection and descriptionWinderS.BrownM.Learning local image descriptorsProceedings of the IEEE Conference CVPRJuly 2007Rio de Janeiro, Brazil18ZagoruykoS.KomodakisN.Learning to compare image patches via convolutional neural networksProceedings of the IEEE Conference CVPRJune 2015Boston, MA, USA43534361Simo-SerraE.TrullsE.FerrazL.KokkinosI.FuaP.Moreno-NoguerF.Discriminative learning of deep convolutional feature point descriptorsProceedings of the IEEE ICCVDecember 2015Santiago, Chile118126BalntasV.RibaE.PonsaD.MikolajczykK.Learning local feature descriptors with triplets and shallow convolutional neural networks11Proceedings of the British Machine Vision ConferenceSeptember 2016York, UK1119AanæsH.DahlA. L.PedersenK. S.Interesting interest pointsHeinlyJ.DunnE.FrahmJ.-M.FitzgibbonA.LazebnikS.PeronaP.SatoY.SchmidC.Comparative evaluation of binary featuresMadeoS.BoberM.Fast, compact, and discriminative: evaluation of binary descriptors for mobile applicationsMoreelsP.PeronaP.Evaluation of features detectors and descriptors based on 3D objectsSchoenbergerJ. L.HardmeierH.SattlerT.PollefeysM.Comparative evaluation of hand-crafted and learned local featuresProceedings of the Conference Computer Vision and Pattern RecognitionJuly 2017Honolulu, HawaiiBalntasV.LencK.VedaldiA.MikolajczykK.HPatches: A benchmark and evaluation of handcrafted and learned local descriptorsProceedings of the IEEE Conference CVPRJuly 2017Honolulu, HawaiiArandjelovićR.ZissermanA.Three things everyone should know to improve object retrievalProceedings of the IEEE Conference CVPRJune 2012Providence, RI, USA29112918KoenderinkJ.van DoornA.Representation of local geometry in the visual systemFlorackL. M. J.Ter Haar RomenyB. M.KoenderinkJ. J.ViergeverM. A.General intensity transformations and differential invariantsFreemanW. T.AdelsonE. H.The design and use of steerable filtersGriffinL. D.LillholmM.Symmetry sensitivities of derivative-of-Gaussian filtersJaccardN.SzitaN.GriffinL. D.Segmentation of phase contrast microscopy images based on multi-scale local basic image features histogramsLillholmM.GriffinL.Novel image feature alphabets for object recognitionProceedings of the 19th IEEE Conference ICPRDecember 2008Tampa, FL, USA14WinderS.HuaG.BrownM.Picking the best daisyProceedings of the IEEE Conference CVPRAugust 2009Miami, FL, USA178185DongJ.SoattoS.Domain-size pooling in local descriptors: Dsp-siftProceedings of the IEEE Conference CVPRJune 2015Boston, MA, USA50975106