Object detection has been attracting much interest due to the wide spectrum of applications that use it. It has been driven by an increasing processing power available in software and hardware platforms. In this work we present a developed application for multiple objects detection based on OpenCV libraries. The complexity-related aspects that were considered in the object detection using cascade classifier are described. Furthermore, we discuss the profiling and porting of the application into an embedded platform and compare the results with those obtained on traditional platforms. The proposed application deals with real-time systems implementation and the results give a metric able to select where the cases of object detection applications may be more complex and where it may be simpler.
Object detection is meant to detect the specific location and size of a particular object in an image or a video scene. With the growing need of detection-based security and industrial applications, the object detection in a fast and reliable manner has been attracting much interest.
There are two types of applications that can use this system. Applications which do not need real-time responses such as surveillance cameras to detect certain shapes like animal shapes in a hospital and applications which require real-time response like fire arms detection in an airport, for example, will require a better performance in terms of detection accuracy and response time. The strength of this system is that it can be trained for any type of object to be detected for different situations. An extension to this work would be to adapt the system to a low cost card and adapt it to the card architecture in order to get better performance and to be able to meet real-time requirement.
There are many types of object detection. One of them is knowledge based methods, which rely on a group of rules of object structures based on the relationship between the features of the specific object to be detected [
Achieving high performance and a near-real-time object detection is a key concern in both large-scale systems and embedded platforms. Therefore, a reliable and accurate near-real-time object detection application, running on an embedded system, is crucial, due to the rising security concerns in different fields. Cascade classifier is one of the fewest algorithms to run in real-time. The key to its high performance is the use of integral image, which only performs basic operations in a very low processing time. However, cascade classifier can only perform in a fixed aspect ratio. There were many attempts to respond to real-time constraints for object detection. Viola and Jones [
The proposed object detection application can be deployed in different platforms; it can be deployed on a high performance platform as well as in mobile platform. It can also be used in surveillance systems with distributed cameras and a back-end server in which the detection takes place. It can also be used in mobile devices equipped with camera and processor. A highly short response time in terms of detection is essential for such systems.
In a previous work [
There are many publications related to object detection but the main challenge remaining in most of the published work is accuracy. However, there were many accomplishments in terms of the speed and precision of object detection. Areas of improvements were focusing mainly on feature selection and classification. Different types of feature selection algorithms were used. Mainly Haar features [
One research trend in object detection is to combine multiple sources of information like color and motion [
A more advanced version of SVM was introduced by Felzenszwalb et al. [
Gall et al. [
Another research orientation proposes to add more resources in terms of hardware. Having parallel CPUs was proven to give good results in real-time detection. Parallelizing algorithm processing in different CPUs is an efficient way to enhance the algorithm performance on a specific platform [
The cascade classifier [
Haar-like feature’s principle is based on detection of features encoding of some information about the class to be detected. They are adjacent rectangles in a particular position of an image. Figure
Rectangular feature.
As described in [
Rectangle features can be computed rapidly using an intermediate representation of the image called the integral image [
For example (see Figures
Integral image blocks.
Integral image sum.
The example in Figure
Integral image: illustration example.
The integral image is defined as follows:
Therefore, the integral value of a specific pixel is the sum of pixels on the top of it towards the left [
The Local Binary Pattern (LBP) is a simple and efficient texture operator, which labels the pixels of an image by thresholding the neighboring of each pixel, resulting in a binary number as shown in Figure
LBP calculation example.
Example of texture primitives detected by LBP.
In 2002, Ojala et al. [
A Local Binary Pattern is called uniform if the binary pattern contains at most two bitwise transitions from 0 to 1 or vice versa when the bit pattern is traversed circularly. For example, the patterns 00000000 (0 transitions), 01110000 (2 transitions), and 11001111 (2 transitions) are uniform whereas the patterns 11001001 (4 transitions) and 01010010 (6 transitions) are not.
The algorithm used is a variant of the cascade algorithm introduced by Viola and Jones [
The value of each feature is computed by comparing the central area with the neighboring area around it (8 neighbors). The result is in the form of a 8-bit binary value called LBP. A number of features represent a stage of the cascade algorithm. Every feature has positive and negative weights associated with it. For the case where the feature is in consistence with the object to be detected, the positive weight is added to the sum. For the case where the feature is inconsistent with the object, the negative value is added to the sum. The sum is then compared to the threshold of the stage. If the sum is below the threshold, the stage fails and the cascade terminates early, and, thus, the processing window moves to the next window. If the sum is above the threshold, the next stage of the cascade is attempted. In general, if no stage rejects a candidate window, it is assumed that the object has been detected.
In order to avoid the redundancy of computing the integral of rectangles, the integral images are calculated to speed up the calculation of the feature.
Viola and Jones [
The strongest classifier uses the strongest feature, which is the best Haar-like feature, that is, the feature that best separates the positive and negative samples.
Cascade classifier [
Cascade classifier.
When a filter fails to pass image regions, that specific subwindow of the image is eliminated for further processing. It is then considered as a nonobject. Meaning that the image regions processed do not contain the object to be detected. This is very crucial to the performance of the classifier, since all or nearly all negative image subwindows will be eliminated in the first stage. On the other hand, when image regions successfully passed the filter, they go to the following stage, which contains a more complex filter. Only regions that successfully pass all filters are considered to contain a match of the object. This means that regions of the image contain the object subject to detection.
The reason behind the multistage classifier is to reject efficiently and rapidly the nonobject subwindows. The next nodes in the chain in Figure
The object detection system has been developed on PC Xeon-based servers (E5670 clocked at 2.93 GHz) and using gcc 4.4.5.
The training phase was performed on this machine for different objects. Thus, different xml files were generated for each object.
The detection was performed on a desktop platform as a reference result carried out using a standard midrange camera and using both live pictures taken and images given to program. Those images contain both objects subject to training and other random objects. The result of detection is displayed on the screen of the computer.
The embedded system used in this work is the Texas Instrument DM3730 digital media processor, Figure
DM3730 board.
The operating system (OS) is the embedded Linux version Angstrom. The system files, for the processor booting, are stored in the flash-NAND memory.
For the training of the cascade classifier, concerning the detection of a particular object, we use a set of images coming from the object itself.
To achieve a high detection rate of the object, we needed to use a large number of images in the training phase. The number of images we used as training set of a particular object is around 4000 positive images. The positive images are images that include the targeted object among others. Other images are also used (negative images) that do not include the object for the training phase.
The dataset we used includes images of the object from different angles in order to make the detection possible in most angles.
The images of objects we used in this application include face object, hand object, and pedestrian (human body object).
For face detection, we used FEI face database [
The output of the cascade training is an xml file that contains data about the object to be detected. An xml file is generated for each object to be detected. The xml file is then used by our application in order to perform the detection. The training was performed separately for both algorithms.
The application implementation was performed using C++ language using OpenCV libraries. The compilation was performed using GCC (GNU Compiler Collection).
You need GNU project C and C++ compiler for compiling C program and creating an executable file for the target platform.
The platform included in the embedded Linux environment includes a prebuilt SD card image from which you can boot and run cross-compiled application code. When you make code changes, it is valuable to rerun a software-only compilation to verify that your changes did not adversely change your program.
The cascade classifier detection function “detectMultiScale” was given the parameters in Table
Detection function parameters.
scaleFactor | 2.0 |
|
|
minNeighbors | 4 |
The detection function uses the two listed parameters in Table
The comparison of the two algorithms is performed in two steps.
The first one represents the measurement of the performance of each algorithm for the detection of a single object in a specific scene.
The second one compares the performances of the two algorithms in detecting multiple objects in a specific scene.
Figure
The results of object detection using both algorithms.
Consumed time for each platform.
Platform | Haar-like feature based cascade algorithm (ms) [ |
LBP feature based cascade algorithm (ms) |
---|---|---|
Standard platform | 31 | 38 |
Texas instrument’s DM3730 | 95 | 90 |
As generally expected, each algorithm performance on the regular platform is better than in an embedded platform. As shown in Table
However, in the embedded platform we can clearly notice that LBP is performing better than Haar-like feature, in terms of detection time. This shows that the LBP algorithm performs better under limited resources, and Haar-like feature algorithm performs better on the regular platform where there is more resource availability.
In terms of accuracy, Haar-like performance is shown to be better than LBP in the standard platform. It has accuracy rate of 96.24% versus 94.75% for LBP. On the embedded platform, we also notice that Haar-like accuracy is slightly better than LBP with a hit rate of 93.56% and 92.65% for LBP. This percentage is computed based on the ratio of hit, miss, and false detection rates of the objects to be detected.
Consumed time for each platform for Haar-like algorithm.
Platform | Execution time (ms): 1 object | Execution time (ms): 2 objects | Execution time (ms): 3 objects | |
---|---|---|---|---|
Standard platform | HL |
31 |
35 |
43 |
|
||||
Texas instrument’s DM3730 | HL |
95 |
99 |
110 |
The overall performances in the standard platform for multiple objects are better than those on the board. However, we notice that the detection system on the board is stable when we increase the detection.
The results given by the embedded system can be considered as positive and encouraging. In fact, we can notice that, for multiple detection, the increase of the consumed time compared to the standard platform is limited for each algorithm. Actually, the increased consumed time seems to be even smaller for the LBP algorithm in the embedded platform. This makes the feasibility of using the system on an embedded platform more realistic.
Figure
Performance on different platforms.
However, for the embedded platform, some enhancements on the algorithms still needed to achieve real-time constraints.
On the other hand, as it was the case for the first set of results, LBP gives a better performance in the embedded system. This proves that it performs better in a limited resources environment. Haar-like algorithm gives better results than LBP in the standard platform, for both single object detection in a scene and multiple object detection. This proves that Haar-like algorithm needs more resources than what is offered by the system to be able to perform better.
For all cases, the profiling on the embedded platform allowed us to detect blocs of the algorithms that consume the most processor time. The result helps to identify the algorithms blocs that need enhancements in order to achieve better results on the embedded platform. We can actually run independent execution tasks in parallel processors in order to enhance the performance of detection to achieve real-time constraints.
In this paper we discuss the performance of two feature selection algorithms Haar-like feature selection and LBP for the detection of a single object and multiple objects in the same scene and for both standard platform and embedded system.
The results illustrated above help us to determine which algorithm can be more efficient in the different environment.
We have combined Haar-like feature selection with cascade classifier and LBP with cascade classifier for accurate comparison.
The goal is to enhance the performance on a low resource embedded system to meet real-time constraints.
From the results above we can see that on the standard platform both algorithms performances meet or are close to real-time object detection.
The next steps of this work will be to enhance the embedded platform performance. This enhancement can be achieved through the usage of parallelism. We can get several processors to run simultaneously separate tasks in order to enhance performance and response time.
The authors declare that there is no conflict of interests regarding the publication of this paper.