Reliable Recognition of Partially Occluded Objects with Correlation Filters

Design of conventional correlation filters requires explicit knowledge of the appearance and shape of a target object, so the performance of correlation filters is significantly affected by changes in the appearance of the object in the input scene. In particular, the performance of correlation filters worsens when objects to be recognized are partially occluded by other objects, and the input scene contains a cluttered background and noise. In this paper, we propose a new algorithm for the design of a system consisting of a set of adaptive correlation filters for recognition of partially occluded objects in noisy scenes. Since the input scene may contain different fragments of the target, false objects, and background to be rejected, the system is designed in such a manner to guarantee equally high correlation peaks corresponding to parts of the target in the scenes. The key points of the system are as follows: (i) it consists of a bank of composite optimum filters, which yield the best performance for different parts of the target; (ii) it includes a fragmentation of the target into a given number of parts in the training stage to provide equal intensity responses of the system for each part of the target. With the help of computer simulation, the performance of the proposed algorithm for recognition partially occluded objects is compared with that of common algorithms in terms of objective metrics.


Introduction
Recognition and tracking of objects in observed scenes degraded by additive noise, in the presence of cluttering backgrounds, geometric modifications such as pose changing and scaling, nonuniform illumination, and eventual object occlusions are challenges that a modern recognition algorithm must solve.In this paper, we deal with partial occlusion of objects to be recognized, in other words, when only some parts of the target are visible.Recent works have paid much attention to this problem [1][2][3][4][5].
Nowadays, object recognition based on correlation filters receives much research interest due to its high impact in reallife activities, such as video surveillance, human-computer interaction, robotics, biometrics, and target tracking [6][7][8][9][10][11][12].Correlation filtering is a powerful technique for object recognition because of its ability to perform two essential tasks simultaneously: detection of a target within an observed scene and computation of the exact position of the detected object [13,14].Another advantage of correlation filters is their ability to detect multiple objects in a single scene simultaneously [15][16][17].
The performance of correlation pattern recognition may be improved either by discarding noise components from the output of a linear system [18] or by using an adaptive approach to the filter design [19].The former approach is suitable for classification problems [20], whereas the latter is preferable for detection and tracking applications.For the case of nonstationary noise such as a cluttered background, statistical parameters of the noise are space-variant.The frequency response of a correlation filter is locally adapted to the parameters estimated in small spatially homogeneous fragments of the input scene.The locally adaptive filter improves pattern recognition in terms of location errors for a noisy environment that is important for accurate target detection.
Conventional correlation filters without training may yield a poor performance to recognize a target partially occluded by other objects [21], for example, to recognize a pedestrian partially covered with a tree or a man wearing sunglasses.There are several proposals to treat partial occlusions with correlation filters [22][23][24][25][26][27][28][29].All of them use independent parts of the target to synthesize a composite correlation filter.However, no study was carried out on an augmented division of the object into parts.
Campos et al. [22] carried out a study on the performance of some correlation filters to discriminate occluded objects.They compare the phase-only filter, the inverse filter, and the trade-off filter between the minimum variance and minimum average correlation energy.All used filters enhance the edges of the object in order to have a good discrimination.The target is divided uniformly into seven parts without any justification.Moreover, the performance of the filters in the presence of noise and geometric distortions was not analyzed.Adaptive correlation filters for recognition of fragmented objects imbedded into real-life scenes and in the presence of additive noise were presented [23,24].The target is divided into independent fragments for the design of an adaptive filter.It was supposed that at least one of the fragments responses to the visible fragment of the target is embedded into the scene.Additionally, the algorithm uses available contour and texture information to improve recognition of partially occluded objects.Recent work [28] improves recognition of partially occluded objects embedded into a known cluttered background with an adaptive composite filter.The proposed filters are able to discriminate noisy similar objects, even, when available information of a target is about 19%.Khoury et al. [30] developed several optimal correlation algorithms for detection of obscured targets embedded into a disjoint background.It was noted that the boundary between obscuring and obscured objects makes a significant contribution to the correlation peak.So, blurring of the boundaries was utilized for detection of obscured targets.
Recently, masked correlation filters (MCFs) were designed [31] to handle partial occlusions in face images.MCFs utilize prior knowledge of the location of partial occlusions in test images as well as the zero-aliasing correlation filtering (ZACF) [27].Since in real-life applications the location of partial occlusions is usually unknown, the filters cannot be widely used.
Finally, note that, in the design of common correlationbased methods, the target is arbitrarily divided into a number of parts, which are used for the design of composite filters.One of the motivations of this research is to determine a reasonable way for the target division to guarantee a high level of the overall system performance.In order to obtain a good recognition of each target part in noisy input scenes, the optimum correlation filters are also utilized [32].
The paper is organized as follows.Section 2 recalls the design of composite correlation filters.Section 3 describes the proposed algorithm for target fragmentation and robust recognition of partially occluded objects with multiple composite filters.Section 4 with the help of computer simulation presents the performance of the proposed algorithms in terms of detection efficiency.The results are discussed and compared with those obtained with common correlation filters.Finally, Section 5 presents our conclusions.

Composite Correlation Filters
We are interested in the design of a correlation filter that is able to recognize a fragment of the target embedded into a disjoint background in the scene corrupted with additive noise.The designed filter should be also able to recognize geometrically distorted versions of the target.Let  = {  (, );  = 1, . . ., } be an image set containing geometrically distorted versions of the target.The input scene is assumed to be composed by the target (, ) embedded into a disjoint background (, ) at unknown coordinates (  ,   ), and the scene is corrupted with additive noise (, ), as follows: where (, ) is a binary function defined as zero inside the target area and unity elsewhere.The optimum filter for detecting the target, in terms of the maximum of the signalto-noise ratio (SNR) and the minimum variance of location errors (LE), is the generalized matched filter (GMF) [13], whose frequency response is given by where (, V) and (, V) are the Fourier transforms of (, ) and (, ), respectively;   is the mean value of the background (, );   (, V) and   (, V) denote the power spectral densities of  0 (, ) = (, ) −   and (, ), respectively.Symbol ⊗ denotes convolution.Let ℎ  (, ) be the impulse response of a GMF constructed for the th available view of the target   (, ).Let  = {ℎ  (, );  = 1, . . ., } be the set of all GMF impulse responses constructed for all training images   (, ).Additionally, let  = {  (, );  = 1, . . ., } be an image set containing  unwanted patterns to be rejected.In order to recognize all target views in  and reject the false patterns in , by combining the optimal filter templates contained in , we synthesize a composite correlation filter.Filter (, ) can be constructed as follows [33]: where the coefficients {  ;  = 1, . . .,  + } are chosen to satisfy prespecified output values for each pattern in  =  ∪ .Using vector-matrix notation, we denote by R a matrix with  +  columns and  rows equal to the size of the images, where each column is the vector version of each element of  ∪ .Let a = [  ;  = 1, . . .,  + ]  be a vector of coefficients.Thus, (3) can be rewritten as Let us denote by the desired responses to the training patterns and denote by Q the matrix whose columns are the elements of .The response constraints can be expressed as where superscript + denotes conjugate transpose.Substituting ( 4) into (6), we obtain Finally, substituting ( 10) into (4), the solution for the composite filter is given by Note that the value of the correlation peak obtained with ( 9) is expected to be close to unity for true-class objects and close to zero for false-class objects.
The MACE [34] filter minimizes the average correlation energy of the correlation outputs for the training images while simultaneously satisfying the correlation peak constraints at the origin.The effect of minimizing the average correlation energy is that the resulting correlation planes would yield values close to zero everywhere except at the location of a trained object, where it would produce an intense peak.In the Fourier domain, the MACE filter can be expressed in vector form as follows: where matrix D contains along its diagonal the average power spectrum of the training images (i.e., average of the magnitude squares of the columns of R).
The Optimal Trade-off Synthetic Discriminant Function (OTSDF) [14] filter is a correlation filter that is similar to the MACE filter.In the OTSDF formulation, matrix D is replaced with V = D + I, where I is an identity matrix and  > 0. The inclusion of the identity matrix improves noise tolerance.
The discrimination capability (DC) is a measure of the ability of the filter to distinguish a target from unwanted objects; it is defined by the following [33]: where   is the value of the maximum correlation sidelobe in background area and   is the value of the correlation peak generated by the target.A DC value close to unity indicates that the filter has a good capability to distinguish between the target and any false object.Negatives values of the DC indicate that the filter is unable to detect the target.Also, if the obtained DC is greater than a prespecified threshold (DC > DC th ), then the target is considered as detected and, otherwise, the target is rejected.

Recognition of Partially Occluded Objects
In this section we describe the proposed algorithm for recognition of partially occluded objects using a new target fragmentation procedure and a bank of composite correlation filters.To improve the detection performance of correlation filters an adaptive approach to the filter design is utilized [35].
The proposed algorithm for automatic fragmentation of the target into  parts is shown in Figure 1.
First, suppose that a visible fragment of the object obtained due to occlusion of the object always contains a part of the object contour.So, we define fragments  1 , . . .,   as sectors of the circle inscribing the object.For each of the fragments  1 , . . .,   and the object (, ), the output of the linear system can be defined as correlation peaks between the impulse response of the correlation filter and the corresponding fragment or the entire object, that is,   1 , . . .,    , and   , respectively.In general, the impulse response of the optimum filter (see ( 2)) depends on the input scene information.If such information is unavailable, the phase-only filter may be used for approximate solution of the problem.We want to divide the target into  sectors in such a manner to obtain equal responses of the linear system to each fragment.In other words, we look for the solution to the following functional: where {  }  =1 are all possible divisions of the object (, ).Actually, minimization of the functional (12) may produce numerous solutions.In this case, additional constraints such as equal area of the fragments can be used for selection a unique one.
Figure 2 illustrates the optimal fragmentation of the object from 2 to 9 fragments with the help of the proposed algorithm.
The filter design requires knowledge of a typical background image and a target.We construct a bank of composite optimum filters.The proposed algorithm for the design of composite correlation filters is given as follows.
Step 3.For  = 1, . . ., , synthesize a composite adaptive correlation filter   (, ) as follows [35]: (i) perform correlation between ℎ  (, ) and the background image.If the obtained DC is greater than a prespecified threshold (DC > DC rec ), then the fragment could be successfully detected in the input scene, and   (, ) is added to set ; otherwise, the detected object   (, ) around false peak is added to the set ; (ii) synthesize a composite filter   (, ) with the help of ( 9); (iii) iteratively perform steps (i) and (ii) with   (, ) until the condition DC > DC rec is satisfied.Step 4. The bank of composite adaptive filters {  (, )}  =1 is used for reliable recognition of partially occluded objects.Detection can be carried out by correlating the input scene with each filter of the bank.Next, the DC in each of the correlation planes is calculated, and the plane with the highest DC value is chosen as the system output.If the obtained DC is greater than a prespecified threshold (DC > DC th ), then the target is considered as detected; otherwise, the object is rejected.Finally, the location of the correlation peak in the chosen output plane is taken as an estimate of the location of the object in the scene.The recognition procedure with the bank of filters is summarized in Figure 3.

Object for recognition
Note that in the proposed algorithm the number  of fragments can be properly chosen to ensure a desired accuracy using a minimum number of correlation operations.Detection performance and location accuracy are monotonically increasing with the number of filters in the bank; therefore, there exists a trade-off between a desired quality of detection and computational complexity in terms of required correlations.

Computer Simulation
In this section, the performance of the proposed algorithm for recognition of partially occluded objects is presented in terms of detection efficiency.The results are compared with those obtained with successful composite correlation filters, that is, MACE [34], OTSDF [22], SDF [14,35], and SDF with MACE (SMACE) filters [23,24].In this paper, type I and type II recognition errors are used for comparing the recognition accuracy of tested algorithms.Type I error occurs when the algorithm asserts something that is absent, a false hit.Type I error is called false positive (FP).Type II error occurs when the algorithm fails to assert what is present, a miss.Type II error is called false negative (FN).
With the help of extensive computer simulation we show how detection reliability and localization accuracy for recognition of partially occluded objects with common and proposed correlation filters depend on the number of fragments, level of input noise, and level of target overlapping.In order to guarantee correctly statistical results, we use 55 different scenes and 16 different objects (see Figure 4).The algorithms are tested in input scenes containing the entire target and randomly occluded targets with the level of available information of 20%, 40%, 60%, 80%, and 100% of the object area.Also, input scenes are corrupted by additive white noise with signal-to-noise ratio (SNR) of 5 dB, 10 dB, 15 dB, 20 dB, and 25 dB.
Tables 1 and 2 show the FP and FN errors for the proposed algorithm as a function of number  = 1, . . ., 9 of fragments, SNR of 5 dB, 10 dB, 15 dB, and 25 dB, and the level of available information of 20%, 40%, 60%, 80%, and 100%.Note that the  threshold DC th = 0.6 provides the minimum of the FP and FN errors.One can observe that the proposed algorithm with a strong occlusion (20% of available information) yields high FN errors of 62%-100% depending on the number of fragments and noise level.However, the recognition performance of the algorithm improves rapidly when either the number of the fragments or SNR increases.On the other hand, the performance of the proposed algorithm in terms of FP errors is excellent.
Tables 3 and 4 show FP and FN errors for the proposed algorithm with division of the object into 9 fragments (9), MACE, OTSDF, SDF, and SDF with MACE.
The tested algorithms use the same number of fragments.However, the division of the object into nonoverlapping fragments is performed uniformly according to the authors' recommendation.One can observe that the proposed algorithm yields much better performance for different circumstances with respect to the objective criteria.

Conclusion
In this paper, we proposed a new algorithm for the design of a linear system consisting of a set of adaptive correlation filters for recognition of partially occluded objects in noisy scenes.
The system consists of a bank of composite optimum filters, which yield the best performance for different parts of the target.In the training stage, the system divides the target into a given number of parts to provide equal intensity responses of the system for each part of the target.With the help of computer simulation, we showed that the performance of the proposed algorithm for recognition partially occluded objects is much better that of common algorithms in terms of objective metrics.

Figure 1 :
Figure 1: Block diagram of optimal fragmentation of the target into  fragments.

Figure 3 :
Figure 3: Block diagram of the proposed algorithm for partially occluded object recognition.

Figure 4 :
Figure 4: Examples of scenes with different objects.