As the current binary descriptors have disadvantages of high computational complexity, no affine invariance, and the high false matching rate with viewpoint changes, a new binary affine invariant descriptor, called BAND, is proposed. Different from other descriptors, BAND has an irregular pattern, which is based on local affine invariant region surrounding a feature point, and it has five orientations, which are obtained by LBP effectively. Ultimately, a 256 bits binary string is computed by simple random sampling pattern. Experimental results demonstrate that BAND has a good matching result in the conditions of rotating, image zooming, noising, lighting, and small-scale perspective transformation. It has better matching performance compared with current mainstream descriptors, while it costs less time.
The local feature descriptor is the core of many computer vision technologies, such as object recognition, image retrieval, and 3D reconstruction. How to design a local feature descriptor that has excellent performance and low complexity is an important and difficult research. Many scholars have proposed a variety of descriptors in this area, such as SIFT (Scale Invariant Feature Transform) [
For these problems, many improved algorithms were proposed. SURF (Speeded Up Robust Features) [
To some extent, above-mentioned descriptors improved the performances, but they still have high computational complexity because of the local histogram statistics. Besides, each dimension of the descriptor is a decimal number, which makes them need a lot of memory. All of these make them difficult to be achieved on a low-power, low-memory application. In recent years, many binary descriptors appear, such as BRIEF (Binary Robust Independent Elementary Features) descriptor [
We describe 3 key steps of BAND in this section, namely, affine invariant radius builder, multidirectional rotation invariant builder, and simple random sampling pattern builder.
There are many algorithms that extract the local affine invariant regions, such as MSER (Maximally Stable Extremal Regions) [
By calculating the affine invariant radius, the original circular pattern is deformed into irregular shape pattern. It is shown in Figure
(a) is a circular pattern and (b) is an irregular shape one. The red balls are the sampling points.
In order to get the rotation invariant property, existing algorithms need to determine the direction of the local area and rotate the axis. In this process, the defects of these descriptors include BRIEF descriptor ignores the rotation; therefore a lot of outliers appear in the case of large-angle rotation. The algorithms, like SIFT and ORB, use the rectangle pattern. It is difficult to obtain the local orientation, and they have a high computational complexity. The algorithms, like FREAK and BRISK, only consider one local orientation with a pattern, although using the circular pattern.
Based on the above three points and reference of the BRISK and FREAK, the pattern of BAND descriptor is as follows: construct five circles concentric with the feature point, and there are 16 sampling points in each circle homogeneously. Every affine invariant radius of every circle is obtained by the affine invariant radius builder.
Firstly, obtain the locations of sampling points on every circle by affine invariant radius:
Secondly, according to literature [
Multidirectional rotation invariant builder. The arrows represent the five local orientations. We can see that the pattern is irregular.
This step is very simple. We get a 256 bits vector by sampling point pairs from the five LBP coding strings randomly and orderly. The bit-vector descriptor is assembled by performing all the comparisons of point pairs (
Programming environment is Matlab 2013, Visual Studio 2010, and OpenCV 2.4.4. The datasets come from [
Each of the datasets contains a sequence of six images exhibiting an increasing amount of transformation. All comparisons here are performed against the first image in each dataset. Figure
Datasets used for evaluation: blur (Trees), JPEG compression (Ubc), rotation (Rome), and brightness changes (Leuven, Office, and Day_Night).
Tress
Ubc
Rome
Leuven
Office
Day_Night
The transformations cover blur (Trees), brightness changes (Leuven, Office, and Day_Night), JPEG compression (Ubc), and rotation (Rome). Match rate is defined as a ratio between the number of the correctly matching points and the total number of the matched points. In order to prove the easy integration of our algorithm, we use FAST [
Evaluation result shows match rate for BAND, ORB, BRISK, BRIEF, SIFT, and SURF.
We only test single performance of BAND descriptor in Section
The 6-pair outdoor scenes are all the buildings and artifacts.
Church
Brussels
Venice
Semper
Rathaus
Fountain
Figure
The results show the matched points, and there is no process of removing the mismatched.
Church
Brussels
Venice
Semper
Rathaus
Fountain
This figure shows the numbers of matched point pairs from different binary descriptors.
This figure shows the match rates of different binary descriptors in the 6 outdoor scenes.
We use SURF for detecting the feature points, and the threshold is equal to before. The 6-pair outdoor scenes are as shown in Figures
In Matlab 2013 platform, we compare the time consumption between SIFT and BAND descriptor per point pair. Test result is shown in Table
The time consumption comparison between SIFT and BAND descriptor per point pair.
SIFT | BAND | |
---|---|---|
Description time (ms) | 531.1 | 0.5463 |
Matching time (ms) | 0.093 | 0.0015 |
If we set the time of BAND describing one feature point to be 1, then we would get the time consumption of other descriptors in Table
It shows time consumption comparison of different descriptors.
Time consumption | ||||||
---|---|---|---|---|---|---|
BAND | ORB | BRISK | BRIEF | SIFT | SURF | |
Feature description | 1 | 8.5 | 5.7 | 8.3 | 1000 | 65 |
In the single performance verification, BAND is demonstrated as it has a better adaptability than the other descriptors. In 6 experiments, the degree of transformations is increasing. It is noteworthy that the Leuven shows a transformation in brightness from normal to dark gradually. Similarly, the Office shows a transformation in brightness from dark to normal. Differently, the Day_Night shows a transformation in brightness from normal to dark, but some local areas become lighter than surroundings because of some lights, such as lamps, bulbs, and candles. In other words, it changes nonlinearly. Since the similarity is reduced, the direction of every line in Figure
In the experiment of matching the outdoor scenes, we can get a lot of matched points by ORB descriptor, but its correctly matched rate is lower than BAND descriptor. One of its negative effects is that there will be more computation in the process of removing the mismatched points. Although the amount of the matched points we get by BAND is not as much as ORB, the amount is kept at a high level and the correctly matched rate of it is satisfactory. It is better than or as good as the correctly matched rate of other descriptors. The reason is that BAND considers the affine invariance and its pattern, whose shape is variable, can better adapt to the changes of viewpoint.
In the experiment of the time-consuming comparison, description time per point of BAND is as short as 0.1% of the time of SIFT approximately. The reason is that there is no process of establishing the Gauss Pyramids or too much computation of fitting operation. BAND computes the distance between two descriptions by using Hamming distance, but SIFT uses Euclidean distance. Obviously, calculating the Hamming distance has higher efficiency and lower computational complexity. So, together, BAND shows higher efficiency. And the comparison of the time-consuming with other descriptors also shows that BAND is faster. Therefore, BAND has advantage in the situation that demands high computing speed.
Experimental results show that BAND descriptor has significant advantages, such as low computational complexity, well adaptability, and good stability. It makes up for the disadvantages of other descriptors that have high computational complexity and have no affine invariance. BAND has a good matching result in the conditions of rotating, image zooming, noising, lighting, and small-scale perspective transformation. More specifically, it has a moderate number of matched points and a high correctly matched rate. Because of these, on the base of guaranteeing accuracy, BAND could improve the calculation efficiency and meet real-time requirement.
The authors declare that there is no conflict of interests regarding the publication of this paper.