TreeBASIS Feature Descriptor and Its Hardware Implementation

,

Although robust and innovative, most associated algorithms cannot be considered hardware friendly.Implementing these most commonly used descriptors in custom logic circuits on a Field Programmable Gate Array (FPGA) is a challenging task, since a full floating-point unit is required.For this reason, hardware implementations of these feature descriptors typically offload the actual descriptor computations to a CPU for floating-point operations [10] or they rely on a simplified algorithm (e.g., using fixed point computation) that maps more readily to hardware [4,12].
The main challenge with commonly used feature descriptors is that they required substantial memory and computing power.More recently proposed descriptors address these drawbacks.New schemes based on intensity comparisons include Binary Robust Independent Elementary Features (BRIEF) [13,14], Binary Robust Invariant Scalable Keypoints (BRISK) [15], and Speeded Up Surround Extrema (SUSurE) [16].BRIEF and BRISK use much smaller descriptors and faster feature detectors than SIFT and SURF; BRIEF has shown improved performance over SIFT and SURF for object recognition.Other approaches based on dimensional reduction techniques have been developed, such as Linear Discriminant Embedding (LDE) [17] and Principal Component Analysis (PCA) [18].Another alternative to SIFT and SURF is a specific set of 256 learned pixel pairs selected for reducing correlation among the binary tests [19].
We previously described a nonfloating point algorithm called BASIS that was designed specifically for FPGA implementation [20].The sparse coding basis dictionaries used by BASIS are nonorthogonal and individual basis images may contain redundant information [21] that reduces the accuracy and increases the descriptor size.To reduce the redundant information, the BASIS descriptor can be improved by implementing a novel computation method based on a vocabulary tree.The new TreeBASIS algorithm that we present in this work improves and extends this idea.The new Tree-BASIS algorithm is even better in resource usage and accuracy than BASIS.With significantly reduced descriptor size, increased efficiency, and simplified calculations, it is fully implementable on a limited-resource FPGA platform while providing better feature descriptor accuracy than previously proposed schemes.
The TreeBASIS descriptor is designed specifically for hardware implementation so no simplifications are required that can compromise performance in realized systems.Although our target system is a small FPGA device based on a Virtex-6 VLX760 FPGA, the approach is well suited for a wide variety of commonly used devices.Our vision system will be used on a small unmanned ground vehicle (UGV) and an unmanned aerial vehicle (UAV) for image stabilization, rectification, and pose estimation.
This paper describes the development and implementation of the TreeBASIS feature descriptor.The approach uses a derivative of sparse coding with a novel vocabulary treebased descriptor computation method.A vocabulary tree is created by using a small basis dictionary to partition a training set of feature region images (FRIs), small pixel regions surrounding detected feature points.The vocabulary tree is computed off-line and stored in memory for on-line descriptor computation and matching.Basis dictionary images (BDIs) are quantized, and their intensity values are represented using a binary vector.TreeBASIS feature descriptors are computed by quantizing an FRI, passing it through the tree, and recording its path.
Section 2 presents an overview of the TreeBASIS descriptor.Section 3 describes hardware components that perform the associated computation and matching algorithms, including detailed synthesis reports and discussion on FPGA size and memory requirements.Section 4 includes a discussion of conclusions and future work.

TreeBASIS
In the TreeBASIS approach, a vocabulary tree is created using a small basis dictionary to partition a training set of FRIs.This vocabulary tree is computed off-line and stored for on-line descriptor computation and matching.TreeBASIS computes features descriptors by passing an FRI through the tree and recording its path.The matching of descriptors between images is achieved by traversing the descriptor-paths of features from the first image and comparing each node to the descriptor-path of the features from the second image.
TreeBASIS consists of three major components: building the vocabulary tree, computing descriptors, and matching descriptors between two images.The tree can be created off-line on a standard desktop computer that is completely separate from the on-line system.Once the tree has been created, it can be loaded into memory and used in real-time on a low-resource platform to compute TreeBASIS descriptors and match them in subsequent images.

Off-Line Tree
Creation.The off-line tree creation stage of TreeBASIS utilizes a dictionary of basis images returned from the K-SVD sparse coding algorithm [22] to partition a training set of FRIs (30 × 30 pixel regions surrounding a detected feature) denoted as .Sparse coding theory states that if K-SVD is trained using a very large dataset of images, the basis dictionary, , returned by K-SVD can be utilized to reconstruct a great assortment of natural images [23].An example of such a basis dictionary is shown in Figure 1, obtained from the GoogleMaps dataset.Sparse coding theory suggests that this basis dictionary would be useful in describing any such set of natural images.
The dictionary set, , is used to partition the training set, , consisting of at least 50,000 FRIs, each 30 × 30 pixels.Each FRI is taken from an image or video frame near a feature point detected by the feature detector.The 30 × 30 region size was empirically determined to produce the best results.Larger regions tend to contain too much data or content not associated with pixels associated with feature points.Smaller regions tend to retain too little useful information and result in FRIs that are not unique.Experiments using the hardware implementation of the BASIS descriptor show a reduction in accuracy if the size of the FRI is reduced [20].
When a training set of images is chosen similar to those expected to be encountered during real-time, then the feature matching accuracy can improve [24].However, the generic nature of the basis dictionary images, , and the FRIs provide some invariance to the differences between the training set images and the images the algorithm may see in real-time.The TreeBASIS algorithm will result in a high level of feature matching accuracy on completely different image datasets, even if consists of more generic training images.
In order to reduce computation time, the BDIs and FRIs are divided into subregions and the average value in each subregion is calculated.Based on that average value, binary thresholding is performed (as shown in Figure 2).The first step in thresholding a BDI or FRI is computing the global average gray value of the entire 30 × 30 region as where  is the number of pixels in the image (900 in this case) and (, ) is the intensity value at pixel (, ).Next, the FRI or BDI is divided into 100 3 × 3 subregions and the intensity values of each pixel in a region are averaged.If the resulting average intensity of each 3 × 3 subregion is greater than the global average intensity , the subregion is set to one; otherwise the subregion is set to zero.This results in a 100-element (single bit) binary quantized vector that is then used in place of the original BDI or FRI.This operation provides two benefits.The comparison against  provides invariance to shadows, shading, and highlights.By using a single-bit binary vector, the number of comparisons at each stage of the computation is reduced, along with drastically reducing the memory footprint of each BDI and FRI.
Relative to the original BASIS algorithm, this approach requires significantly less computation and memory to compare an FRI from an image with a BDI from the dictionary.In addition to quantizing the BDIs and FRIs, another major advantage TreeBASIS provides is that it utilizes a binary tree structure to avoid comparing each FRI with every BDI in the dictionary.
The tree creation process is illustrated in Figure 3.Each node of the tree is created by taking a set of training FRIs (starting with ) and determining the most effectively descriptive BDI (EDBDI),  ED , from the dictionary .An EDBDI is that BDI from a dictionary that most evenly partitions a given set of FRIs.The EDBDIs are the BDIs that most effectively portray critical feature characteristics because half of the FRIs match them closely, while the other half do not.By building a tree in this manner, the number of FRI-BDI comparisons is drastically reduced.Moreover, the criteria for inclusion in the tree ensure that FRIs will only be compared with those BDIs most likely to help distinguish this FRI from all other FRIs that may appear.
Given sets  = { 0 ⋅ ⋅ ⋅   } and  = { 0 ⋅ ⋅ ⋅   }, we define the entropy over set  with respect to  ( ∈ ) as where and where ℎ(, ) is the Hamming distance between  and .Using (2), the EDBDI for a node of the tree, is the  ∈  which most evenly divides . ED is chosen as the EDBDI for the node, and the remaining elements of  are used with   to create the left child and with   to create the right child.Each node stores the  ED it used for splitting (the quantized version of  ED ), and a node number unique to the entire tree.
The process goes on until remaining subsets of  can no longer be split.Since there is no guarantee that any   in the set  will evenly divide the set , the partitions at each node are not guaranteed to be perfectly even.Because of this, the tree is not guaranteed to be balanced.To split the node as evenly as possible, the best   from the entire set of  is chosen.The completed tree is saved to disk so that it can be reloaded for on-line processing.
Because an efficient tree structure is used, the training set  can be very large while still maintaining fast real-time comparison speeds.A binary tree structure contains 2 n unique paths, where  is the number of leaf nodes in the tree.With 64 EDBDIs the entire set of 50,000 FRIs can be partitioned with a tree depth of 17 levels [24].A binary tree with 17 levels contains 2 17 -1 nodes, meaning each of the 64 EDBDIs could be used, on average, over 2,000 times in the tree.This demonstrates how effectively these BDIs partition feature region images.Even though the tree contains 2 17 -1 nodes, the on-line portion of the TreeBASIS algorithm only needs to hold the data for 64 EDBDIs.The individual tree nodes simply contain a reference to which EDBDI is used, along with pointers to the left and right children.
For any given application, the corresponding  and  are passed to the algorithm which creates a tree that is saved to disk.Each node in the tree contains only an index indicating which of the 64 BDIs is used for partitioning and pointers to its left and right children.When the tree is reloaded from disk, the quantized BDI vector associated with each index can be loaded from memory when it is needed for comparison.

Calculating Descriptors.
Once the tree structure is created, descriptors can be quickly computed in real-time.Figure 4 summarizes the on-line portion of the TreeBASIS descriptor algorithm.(More details can be found in [24].)The algorithm takes a list of features detected using the FAST feature detector [25] and returns a descriptor for each feature.The FAST detector does not offer a dominant orientation or scale measurement.When considering frame-to-frame feature matching for UAV or UGV applications, scale changes and rotation between frames are small and can be ignored.This allows us to use the FAST detector and avoid trade-offs associated with providing increased invariance [26].
For each detected point, an FRI is obtained.Each FRI is a binary threshold over the same number of regions, and the resulting vector is passed into the tree.A Hamming distance ℎ is again computed between the binary FRI vector and the binary EDBDI vector that is saved at the given node.
We then check the similarity between the FRI and the EDBDI at the node.If they are very similar then they are passed to the right child and if they are dissimilar then they are passed to the left child.This is necessary, since the fact that an FRI differs from a given BDI is just as informative as knowing that the FRI is very similar to a BDI.Rather  than keeping a unique node number, a simple binary value indicating a left branch (0) or right branch (1) at each node is sufficient to retain the entire path's information.The resulting descriptor length then is equal to the depth of the tree.For any given node in the path, each level requires only one bit (0 or 1) to record if the FRI went down the left branch or right branch.A 17-level tree will produce descriptor-paths that are at most 17 bits long compared to the 2,304 bits used in a BASIS descriptor and the 8,192 bits used to represent the doubleprecision floating point values of a SIFT descriptor.

Comparing Descriptors.
In order to reduce the number of comparisons, tree structures are commonly used.The advantage of the TreeBASIS algorithm is that the tree is already used to compute the descriptor, so an additional tree for matching is not required.As explained earlier, each node in the BASIS tree is represented by an EDBDI, which represents a feature characteristic.Each node therefore represents a comparison and the result of an FRI characteristic.As a result of this, if two features' descriptors follow the same path, they must contain the same feature characteristics represented by the EDBDIs.
Conversely, if they do not follow the same path, then they differ on a given feature characteristic and do not match.It is very important to determine at what depth of the tree the paths diverge.During the tree creation, the training set of FRIs is split in half at the root node.Features that diverge at the root differ more markedly than features diverging at a node much deeper in the tree.
The partitions at each node are not guaranteed to be perfectly even, so all paths in the tree may not progress to the deepest possible level.Because of this fact, descriptors may have variable lengths.The distance between two features is computed from the paths that are traversed.Initially, the distance is set to a large value.If the path elements are equal, the distance is reduced by where  is the length of the path traversed so far and  is the total path length of the shorter descriptor path.For the same descriptor, the distance computation returns zero value.If the descriptors differ at any element along the path, the distance computation halts and the current value of  is returned.In this way, the distance reflects how many of the paths of two descriptors are similar.Before the on-line portion begins, the maximum depth of the tree is known, which helps to allocate enough memory to store a maximum-depth path.

TreeBASIS Hardware Implementation
This section describes the development of actual hardware components that execute computation and matching algorithms using the TreeBASIS descriptor.No changes need to be made to the algorithm to implement it in hardware.The matching results of the bit-level accurate software version are identical to those of the hardware version.The TreeBASIS algorithm was implemented in VHDL and simulated, including the complete process of calculating feature descriptors and matching two descriptors.We discuss these simulation results and the size and clock rate of the resulting hardware realized on an FPGA.

TreeBASIS Descriptor.
TreeBASIS, as described in Section 2, computes feature descriptors as the path an FRI takes as it is pushed down a precomputed vocabulary tree.The vocabulary tree is computed off-line on a standard desktop computer and then saved to disk.During on-line processing, the system loads a copy of the vocabulary tree and uses the data in the tree to determine which EDBDIs need to be compared to the FRI in question, and then the branching choices are recorded as a descriptor.The off-line portion of the TreeBASIS software algorithm is used to provide a tree for the on-line portion of the TreeBASIS hardware implementation.The on-line portion of the TreeBASIS algorithm performs Hamming distance calculations which can be implemented in hardware using XNOR gates and an adder.The binary quantization of the FRIs and BDIs is also fully implementable in hardware without mathematical simplifications.Matching two descriptors requires a basic comparison of two 100-bit vectors which can be implemented with basic hardware comparators and adders.Therefore, no mathematical or other simplifications are necessary to implement the TreeBASIS descriptor in hardware.Because no simplifications are required, the accuracy of the hardware implementation is identical to the bit-levelaccurate software version.

Descriptor Hardware Implementation.
Figure 4 shows the top-level hardware layout of the TreeBASIS descriptor computation system.It consists of three major processing cores: a feature detector, a binary quantizer, and a tree processor.Additionally, it contains four large memory structures: image BRAM, feature vector FIFO, BDI vector ROM, and tree ROM.The image BRAM (640 × 480 × 8 bits) contains a copy of the image as it is passed out of the feature detector.The feature vector FIFO (2000 120-bit words) holds up to 2,000 feature vectors.This includes the 20 bits of (, ) location information and the 100-bit binary quantized version of the 30 × 30 pixel FRI.The BDI vector ROM (BRAM, with 64 100-bit words) contains the binary quantized versions of all 64 EDBDIs that are used in the tree.
The tree ROM (BRAM, with up to 2  41-bit words) holds the entire BASIS tree.The tree ROM has a word size of 41 bits, consisting of 4 elements per word.The first 6 bits are the address in the BDI ROM of the quantized EDBDI vector for the current node.The next 34 bits are the addresses in memory of the left and right children, and the least significant bit indicates whether or not this node is a leaf.Table 1 shows a few lines from a tree ROM built from a BASIS tree with 17 levels.The three major processing cores are discussed below.[9,[27][28][29][30][31] and their implementations are discussed later in this paper.As such, an assumption is made that the feature detector used will output the current image pixel as the image is streaming into the system, along with the (, ) coordinates of the next valid feature and a valid feature signal.

Binary Quantizer.
Figure 5 shows a graphical illustration of the binary quantizer core.The binary quantizer takes in a 30 × 30 pixel region centered around a feature point and performs two major operations.First, it sums and computes the average gray value of the entire 30 × 30 pixel area.Second, it binary thresholds each pixel in the region based on that pixel's comparison to the average.In a software implementation, this quantization is done after the entire image is saved into memory and the 900 pixels are loaded from memory and operated on.In contrast, a hardware implementation can take advantage of the real-time streaming data flow to process these quantized values while the image is being captured.
The binary quantizer takes as input signals the  and  coordinates of a feature and a valid feature signal.While the image is streaming into the system pixel-by-pixel and rowby-row, the feature detector is processing features and passing each pixel through to the binary quantizer.The feature detector also maintains an (, ) counter denoting the position of the current pixel in the image.When the feature detector determines that it has found a feature, it sets the valid feature output high and puts the (, ) coordinates of the feature onto the feature bus.
The binary quantizer contains 30 row buffers.A row buffer is a simple barrel shifter of depth where   is the width of the image and   is the width of the FRI.Along with the barrel shifter, the row buffer contains   registers which are connected in-line with the shifter and are also accessible as outputs from the binary quantizer core.As each pixel comes into the quantizer, it is shifted into the first register of the first row.The pixel coming out of the last register of the first row is fed into the input of the barrel shifter.
The output of the barrel shifter is fed into the first register of the second row.This process repeats for all 30 rows so that at any point in time a 30 × 30 pixel window is accessible via the registers.At the same time, the binary quantizer is computing a running sum of the values in the registers and holding the current running average of the 30 × 30 window.The 900 registers are grouped into 100 groups of 3 × 3 pixels (resulting in 100 quantized values, one for each 3 × 3 pixel subregion in the FRI).An adder attached to each group holds a running sum of the current 3 × 3 region.The output of the adder is subtracted from the running average of the entire FRI (Figure 6), and the result is fed into a thresholder which returns a single bit for each 3 × 3 subregion.A 120-bit register is used to store the 100 quantized bits plus the 20 bits containing the (, ) position of the feature.The valid feature signal from the feature detector is buffered so that it will go high when the desired feature point is located in the center of the 30 × 30 register block.When the valid signal goes high, it latches the quantized vector (along with the pixel coordinates) and then sends both to the feature vector output line along with a write-enable signal which inserts the quantized FRI into a feature vector FIFO.The implementation focuses on 640 × 480 resolution UAV images that could return several hundred features.A feature vector FIFO is built to hold up to 2,000 feature vectors per image.The parameters of the feature detector can be adjusted so that more or fewer features are returned if necessary.
As soon as there are any features in the feature vector FIFO, computation of descriptors begins.In this way, descriptors are being computed on features even before the entire image has been captured into memory.This streaming data flow allows even more features than would be possible if features are processed only during the space between frames.The output of the feature vector FIFO is connected to the tree processor component which computes the feature's descriptor.3.2.3.Tree Processor.Figure 6 shows a graphical illustration of the tree processor component.The tree processor component takes a 120-bit feature vector from the feature vector FIFO and computes a 37-element descriptor, with 20 bits for the (, ) coordinates, and 17 bits for the path-decisions that comprise the descriptor.In order to compute the descriptor, the processor must be able to traverse the precomputed BASIS tree.For this purpose, software code was developed to take a precomputed BASIS tree and build an appropriate FPGA ROM that can be loaded during VHDL synthesis.The code reads in the entire tree from disk and then recourses through the tree and stores each node as it passes into a one-dimensional array.Once the entire tree has been traversed, the code then iterates over the array and changes the tree-style pointers to the left and the right children into array indices where the left and right children have been stored.This results in a flat memory file that can be processed in the same fashion as a standard Turing machine.This array is then written to a ROM file which is read during VHDL synthesis.

International Journal of Reconfigurable Computing
Each node in the tree is associated with the EDBDI that was used to split the training data.Another program was written that takes in the basis dictionary used to create the tree and writes a VHDL ROM file that contains the 100-bit quantized vectors of each BDI.Table 2 shows a few lines from a BDI ROM containing 64 EDBDIs.The tree processor begins by loading location 0 from the tree ROM (Table 1) which contains the root node of the tree.
The first piece of data from the ROM, the EDBDI vector address, is wired directly to the BDI Vector ROM (Table 1) which contains the quantized vectors of all EDBDIs used to build the tree.The 100-bit quantized vector output of the EDBDI ROM is passed into the tree processor along with the remaining bits from the tree ROM which contain the left and the right child memory addresses and a leaf bit.The FRI vector (from the feature vector FIFO) and the EDBDI vector (from the EDBDI ROM) are passed into a Hamming distance calculator (XNOR gate and an adder).The resulting Hamming distance  is compared to Ĥ ( Ĥ =          /2).If  > Ĥ, a value of 1 is pushed into the 37-bit descriptor register (which already holds the 20 bits of (, ) information that came with the FRI vector); otherwise a value of 0 is recorded.The 0 or 1 value is also passed to the tree address calculator, which is a simple MUX that chooses the left or right node address based on the value of the descriptor element.The resulting next node address is sent to the tree ROM, which starts this process over again at a new memory location in the tree ROM.When a leaf is reached (the leaf bit from the tree ROM is a 1), the finished descriptor is pushed into the descriptor FIFO, the next node address is set to 0, and a pop command is sent to the feature vector FIFO in order to obtain the next feature.This process is repeated until the feature vector FIFO is empty and the feature detector has finished.

Descriptor Matching.
The final step in the system is to compute matches between two subsequent images in a video sequence.Figure 7 illustrates the correlation core of the TreeBASIS hardware implementation.Recall that the Tree Processor core pushes descriptors into a descriptor FIFO.The correlation core contains three memory elements in addition to those mentioned previously.
A compared descriptor FIFO holds up to 2,000 37-bit descriptors.When a descriptor is popped out of the descriptor FIFO it is immediately pushed into the compared descriptor FIFO and passed to the path comparator core.A descriptor BRAM also holds up to 2,000 37-bit descriptors from the previously processed image.When the feature detector reports that it has finished, the feature FIFO is empty, and the descriptor FIFO is empty, the feature transfer core moves all elements in the compared descriptor FIFO into the descriptor BRAM in preparation for the next image to be processed.The matches FIFO holds are up to 2,000 40-bit matches.When the path comparator finds a match, the 20-bit (, ) coordinates from image 1 ( 1 ) and the 20-bit (, ) coordinates from image 2 ( 2 ) are concatenated and stored in this FIFO.They can be accessed by other hardware components or saved into RAM for software use.In addition to these memory structures, the correlation core contains two processing elements: the path comparator and the feature transfer core.

Path Comparator.
The path comparator takes in a descriptor from the descriptor FIFO and from the descriptor BRAM.It compares the two 17-bit descriptors from the 37-bit vectors in order to determine a match.In the current hardware implementation, as well as in the software implementation, only identical descriptor paths are considered to be a correct match.Therefore, the path comparator computes the equivalency of the two 17-bit vectors.If the two vectors are equal, the 20-bit (, ) coordinates from both descriptor vectors are concatenated together and the 40-bit match is pushed onto the matches FIFO.If the vectors are not equal, the path comparator increments the descriptor BRAM address and continues until it either reaches the last descriptor in the BRAM or a match is found.Once this condition is reached, the path comparator requests a new descriptor from the descriptor FIFO (which causes it to be recorded in the comparable descriptor FIFO) and resets the descriptor BRAM address to 0. The process continues until the frame is finished, the feature FIFO is empty, and the descriptor FIFO is empty.When this condition happens, an overall "completed" signal is asserted at which point the FIFO for the match can be accessed or saved in memory for software use.

3.3.
Accuracy.The software system developed in Section 2 provides results with the same accuracy as those of the system implemented in hardware.The results of the bit-levelaccurate TreeBASIS system are provided here for clarity of discussion.To calculate the homography accuracy the same procedure is carried out on the dataset as used in [31,32].Table 3 shows the results of the original BASIS algorithm alongside the results of the TreeBASIS algorithm.In the original BASIS descriptor, each descriptor is 128 ternary digits long, requiring 2,304 bits total per descriptor.The BASIS software algorithm (using 2,304-bit descriptors) achieved an accuracy of 75.5% on the Idaho test set [20].The TreeBASIS algorithm, using 17-bit descriptors and a tree built using 64 BDIs and a training set of 50,000 FRIs obtained an accuracy of 79.6% on the Idaho test set.SIFT and SURF algorithms were implemented for comparison according to [29,30].

Synthesis Results
. The TreeBASIS Hardware descriptor was written in VHDL, synthesized, and implemented on a Xilinx Virtex-6 VLX760 FPGA.Table 4 shows the result of the system synthesis on this platform.These results are for the complete binary quantizer, tree processor, memory components, and correlator systems.The feature detector is not included in this summary.The TreeBASIS hardware system requires less than 1% of the slice registers on the FPGA, 5% of the slice LUTs, and 30% of available BRAMs.Because the TreeBASIS descriptor system is so small and efficient, there is sufficient FPGA fabric remaining to implement additional computer vision components or to select a smaller FPGA if power and weight constraints dictate.The TreeBASIS system uses less than half of the available resources on the FPGA, making it an ideal fit for low resource applications.
All timing constraints of the TreeBASIS VHDL design were met without any reduction in clock rate.As such, the entire system will run at the VLX760's top clock rate of 400 MHz.Due to the use of a cascaded adder system and the streaming nature of the pixel data, it requires only one additional clock cycle to compute the binary quantized version of the FRI once a feature has been identified.It requires just 46 clock cycles to compute one descriptor.The image sensor

Conclusion
The TreeBASIS descriptor has been shown to be an effective feature descriptor for the task of UAV aerial frame-toframe feature matching.It is ideally suited for low-resource implementations such as FPGAs.In this paper, we have presented the development and implementation of the hardware version of the TreeBASIS descriptor.There is no loss of accuracy from the software version to the hardware version because no simplifications or modifications to the algorithm were necessary.TreeBASIS takes advantage of the benefits sparse coding dictionaries provide and it optimizes descriptor size and comparison speed by using a tree structure to reduce the number of comparisons required.The TreeBASIS descriptor requires less descriptor size and computation time, yet it provides better feature matching accuracy than BASIS, SIFT, and SURF descriptors.The TreeBASIS hardware system, when implemented on a Virtex-6 VLX760 FPGA, utilizes less than 6% of the available slice logic and only 30% of the available BRAM.This allows the majority of the FPGA to be used for additional vision processing or other tasks that a limited resource system may require.The TreeBASIS descriptor provides a much needed solution to the task of high-level computer vision processing for low-resource systems.The TreeBASIS algorithm is target independent, so any platform can be used for the implementation without any modification.The entire TreeBASIS feature descriptor algorithm has been implemented in hardware, demonstrating the fact that it is a fast and efficient matching solution ideally suited for small, lightweight, embedded platforms.

Figure 1 :
Figure 1: The basis dictionary, , returned from K-SVD using the entire GoogleMaps dataset.

Figure 2 :
Figure 2: An example of the binary thresholding of an FRI.This approach reduces the memory footprint of the image from 900 8-bit pixels to a 100-bit vector.The smaller vector requires 100 1-bit hardware comparisons, instead of 900 8-bit comparisons.

Figure 3 :
Figure 3: The tree creation process.The most effectively descriptive basis dictionary image (EDBDI) is chosen from the set  as the  that most evenly divides the set of training FRIs, .The tree is split using this EDBDI, and the subsets of  are passed to the left and right children.

Figure 4 :
Figure 4: The overall hardware design of the TreeBASIS descriptor algorithm.Processing components are shown as green squares and memory components are shown as orange rhomboids.Bit widths of signal paths are denoted next to or below the signal path name in parentheses.

Figure 6 :
Figure6: The tree processor core of the TreeBASIS hardware design navigates the tree ROM, pulls quantized EDBDIs from the BDI ROM, and makes decisions on which branch to follow based on the resulting Hamming distance between a node's EDBDI and the FRI.

Figure 7 :
Figure 7: The matching subsystem of the TreeBASIS hardware implementation requires three additional memory structures and two processing cores.
Conceptually,   is the portion of  whose Hamming distance is less than or equal to          /2, while   is the portion of from   or using a method such as sum of absolute differences (SAD) would average out the differences across the entire 30 × 30 region, without indicating which regions of   and   are similar.Because the Hamming distance is used, the number of the 100 unique subregions of   that are similar to   can be calculated, rather than a simple average of similarities and disparities.Basis dictionary B FRI training set F subset F L subset F R

Table 1 :
The first 15 words in the tree ROM.
Figure5: The binary quantizer processing core for the TreeBASIS hardware descriptor contains 30 row buffers which allow it to compute quantized FRI vectors while the image is streaming into the system.

Table 2 :
Six lines from a BDI ROM created from a dictionary of 64 EDBDIs.

Table 3 :
Accuracy results and memory footprints for SIFT, SURF, BASIS, and TreeBASIS on the Idaho dataset.Memory usage assumes that 1,000 features per image are kept for each algorithm.

Table 4 :
The TreeBASIS descriptor system, including the correlator, takes a very small percentage of FPGA's available fabric.for this research outputs 640 × 480 pixels on a 25 MHz clock at 30 frames per second.At a clock rate of 400 MHz, the TreeBASIS descriptor FPGA system can calculate over 200,000 descriptors in the 33 milliseconds it takes to obtain one image (at 30 frames per second) using the 25 MHz image sensor clock speed or over 100,000 descriptors per frame if operating at 60 frames per second. used