Background subtraction is a popular method for detecting foreground that is widely adopted as the fundamental processing for advanced applications such as tracking and surveillance. Color coherence vector (CCV) includes both the color distribution information (histogram) and the local spatial relationship information of colors. So it overcomes the weakness of the conventional color histogram for the representation of an object. In this paper, we introduce a fuzzy color coherence vector (FCCV) based background subtraction method. After applying the fuzzy c-means clustering to color coherence subvectors and color incoherence subvectors, we develop a region-based fuzzy statistical feature for each pixel based on the fuzzy membership matrices. The features are extracted from consecutive frames to build the background model and detect the moving objects. The experimental results demonstrate the effectiveness of the proposed approach.
1. Introduction
With the increasing significance and popularity of digital multimedia applications such as video surveillance, object tracking, action understanding, anomaly detection, and gait recognition, the detection of moving objects in video frames occupies a critical role in the subsequent tasks. A successful detection system with great accuracy promotes the overall performance of the further processing. Therefore, various methods have been proposed for detecting moving objects in the past years. However, robust detection of moving objects in presence of complex backgrounds is yet a challenging problem.
The basic detection method is based on frame difference. For consecutive frames of the video, it compares the intensities of the pixels at the same location and extracts the objects with an appropriate threshold. But this approach is sensitive to various disturbs, such as illumination changes, swaying vegetation, rippling water, and changes in the background geometry. When the number of frames in the image sequence is large and there is little change between consecutive frames, the background modeling and subtraction method is another solution for the detection problem [1].
The heart of a background subtraction method is background modeling that uses the new video frame to calculate and update a background model. This resulting background model provides a description of the entire background scene. The moving objects can be detected by identifying the pixels that are not consistent with the background model. Many different approaches have been proposed for modeling the background. The simple but often effective parametric approach is to model each pixel of a video frame with the single Gaussian distribution. The model parameters (mean and covariance) can be recursively updated with a simple adaptive filter [2]. This model works well for the static background. In the natural environments, even though the camera is fixed, the background appears to be dynamic because it always includes repetitive motions like swaying vegetation, rippling water, flickering monitors, and so forth. To model the dynamic background, the use of a single Gaussian becomes inappropriate and the mixture of Gaussians model (MoG) is preferable [3, 4]. When changes in the background are fast, the Gaussian assumption based modeling method is not suited. Therefore, a nonparametric approach was proposed for background modeling [5]. This method makes use of a general nonparametric kernel density estimation technique for building a statistical representation of the scene background. A major drawback of this approach is that it ignores the time-series nature of the problem. Moreover, KDE requires training data from a sequence of examples that have a relatively “clean” background [6].
The pixel-based method is partially inspired by an observation that the background model will be well established if the background is static and there are sufficient training frames. Therefore, for the dynamic texture background and real-time applications, these pixel-based methods do not perform well. To overcome their limitation, researchers pay attention to the region-based methods that usually divide an image into (overlapping or nonoverlapping) blocks and calculate block features, from which we can build the background model and detect the foreground. Mason and Duric [7] extracted a feature vector based on the edge histogram to describe a block and then to detect the moving objects. Heikkilä and Pietikäinen [2] proposed a block-based background subtraction method based on the local binary pattern (LBP) texture measure. Each image block was modeled as a group of weighted adaptive LBP histograms. Zhang et al. [8] proposed the spatiotemporal local binary pattern (STLBP) to model dynamic textures. In another work [9], they furthermore used the covariance descriptor, defined based on various spatial and texture features, to efficiently suppress dynamic textures in the background. Recently, several authors have explored the adoption of fuzzy approaches to deal with different aspects of detecting moving objects. Sigari et al. [10] proposed a fuzzy version of running average method (FG) for background subtraction. Maddalena and Petrosino [11] introduced a fuzzy learning factor into the background model update procedure (FASOM). W. Kim and C. Kim [12] introduced a fuzzy color histogram (FCH) based block features for background subtraction. The background model is reliably constructed by computing the similarity between local FCH features with an online update procedure.
A color histogram provides only a distribution of the color. The FCH could attenuate color variations generated by background motions. However, color histogram is a very coarse image characterization so that images with similar histograms can have dramatically different appearances. Color coherence vector (CCV) does include both the color distribution information (histogram) and the local spatial relationship information. Therefore, this paper proposes a background subtraction method based on fuzzy color coherence vector that is derived from CCV by applying the fuzzy c-means clustering. The local features extracted from the fuzzy color coherence vector are used to build and update the background model and detect the moving objects.
This paper is organized as follows. In Section 2, we provide a brief overview of color coherence vector and introduce the fuzzy color coherence vector. Section 3 presents the background modeling and moving objects detection method. To demonstrate the effectiveness of the proposed background subtraction method, the experimental results on various image sequences will be shown and analyzed in Section 4. Section 5 draws a conclusion for the paper.
2. Fuzzy Color Coherence Vector2.1. Color Coherence Vector
Color is an essential and important property of an image; thus, it has been explored in various image processing problems including background subtraction. A color histogram can be regarded as the probability density function that describes the probability for pixels in the image to belong to a color bin. It is computationally efficient and generally insensitive to small changes in camera position and geometric transformation such as rotation. However, a color histogram provides only a distribution of the color. It is a very coarse image characterization so that images with similar histograms can have dramatically different appearances.
In order to overcome the limitations of the conventional color histogram, Pass and Zabih [13] propose a histogram refinement method and give a new color characterization, color coherence vector (CCV), that includes both the color distribution information (histogram) and the local spatial relationship information. As shown in Figure 1, there are two images with the same color histograms. However, they may represent different objects. Because the descriptor CCV covers spatial relationship information of the color of pixels, the two images can be distinguished with their CCVs.
Two images with the same histograms, but different CCVs.
For a usual RGB image, there are three color channels, and each color level is 256, which is of heavy computational burden to obtain CCV. In addition, the human visual perception system does not match with RGB color space very well. Therefore, we first convert RGB into HSV color space. In general, taking into account computational complexity and perceptual representation performance, the HSV image will be quantized into a gray image with m bins (levels). This paper adopts the quantization schemes: 4 hues, step 90 (315, 45, 135, 225, 315), 4 saturations, step 0.25 (0, 0.25, 0.5, 0.75, 1.0), and 8 values, step 0.125 (0, 0.125, 0.25, …, 1.0), which will generate a gray image LHSV with 128 bins [14]. In order to improve the detection performance, we also quantize value (V) into 128 bins resulting in another gray image LV with 128 bins. The images LHSV and LV will be combined to detect the moving objects.
In the CCV framework, a pixel within a given color bin is classified as either coherent or incoherent. A coherent pixel is a part of a sizable contiguous region, while an incoherent pixel is not. Each CCV represents this classification for a color bin in the image. To determine the coherent pixel, we compute connected components using 8-connected neighbors with a given color bin. A pixel is coherent if the size of its connected component exceeds a threshold; otherwise, the pixel is incoherent.
Suppose that αi and βi are the numbers of coherent and incoherent pixels in the ith color bin of the histogram, respectively. Then the color histogram of the image with m color bins is described as 〈(α1+β1),(α2+β2),…,(αi+βi),…,(αm+βm)〉 and the CCV of the image is defined as 〈(α1,β1),(α2,β2),…,(αi,βi),…,(αm,βm)〉 [13, 15]. The judging threshold th for the coherence pixel is defined as follows:
(1)th=γ(Ni1+Ni2+⋯+Nij+⋯+Nin)ni,
where Nij denotes the number of pixels of the jth individual coherent regions for the ith color bin, ni is the total number of the regions, and γ is a weighting coefficient to control the accuracy of detection and is obtained from practical training. As shown in Figure 1, there are six pixels with green color in each image resulting in the same histograms. However, if we set the threshold th to be 2, then the corresponding elements in the CCVs of the two images are (3,3) and (6,0), respectively.
When an object is moving into a region, not only the local color histograms but also the local spatial relationships of the color of this region are changing. The CCV includes both the distribution and spatial relationship information of the color. Hence, the CCVs of LHSV and LV should give better clues for moving objects detection.
2.2. Fuzzy Color Coherence Vector
Based on the observation that background motions (dynamic texture) do not make severe alterations of the scene structure, W. Kim and C. Kim [12] adopt the fuzzy color histogram (FCH) to attenuate color variations generated by background motions and improve moving objects detection performance. The CCV is the refinement of the color histogram by considering the local spatial relationship. Therefore, this paper combines fuzzy c-means clustering with CCV and introduces the fuzzy color coherence vector (FCCV) for background subtraction.
After obtaining the conventional CCVs of LHSV and LV, 〈(α1HSV,β1HSV),…,(αmHSV,βmHSV)〉 and 〈(α1V,β1V),…,(αmV,βmV)〉, we apply the fuzzy c-mean algorithm to the combined coherence subvector 〈α1,α2,…,αm×m〉(αk=(αiHSV,αjV),k=(i-1)×m+j) and obtain a membership matrix UC and d clusters. The matrix UC describes the membership degree of every element αi belonging to each cluster. For conducting the FCM algorithm, given the initial membership values uij that are randomly generated and subject to the condition ∑i=1duij=1, 1≤j≤m×m, the initialization values of clustering centers ci are obtained by using (2), and then they are updated iteratively as [12],
(2)uij=1∑k=1d(∥αj-ci∥/∥αj-ck∥)1/(r-1)iiiiiiiiiFor1≤i≤d,1≤j≤m×m,ci=∑j=1m×muijrαj∑j=1m×muijrFor1≤i≤d,
where r is a constant and set to 2, which controls the spread degree among the fuzzy clusters. The clustering process stops when the maximum number of iterations 100 is reached, or when the objective function improvement between two consecutive iterations is less than the specified threshold. The same clustering method is applied to the combined incoherence subvector 〈β1,β2,…,βm×m〉(βk=(βiHSV,βjV), k=(i-1)×m+j) and obtains a membership matrix UI.
Once we obtain the two fuzzy membership matrices, each pixel of the HSV image can be endued with the membership values to coherence clustering centers and incoherence clustering centers, respectively, by using the corresponding gray values of LHSV and LV images. Both the membership matrices UC and UI for the first video frame are calculated and stored in advance. They could be treated as two look-up tables. For each pixel of remaining video frames, we can obtain its membership values to coherence and incoherence clustering centers by using gray values of LHSV and LV images and table-look-up method without computing membership matrices.
In order to construct the background models, we define the region-based feature FCCV that can be easily built by referring to the stored membership matrices UC and UI. For a video frame, we convert it into HSV color space and then quantize the HSV image into two grey images as the first frame. For a pixel at the position g, it is assumed that the grey values of the LHSV and LV images are i and j(1≤i,j≤m), respectively. Then the membership value ug,hC of this pixel belonging to the hth coherence clustering centers is the hkth element [UC]hk(k=(i-1)×m+j) of the matrix UC, while the membership value ug,hI to the hth incoherence clustering centers is the hkth element [UI]hk of the matrix UI. The region-based features vector (FCCV) Fg of the gth pixel is defined as follows:
(3)Fg=[FgC,FgI]=[fg,1C,fg,2C,…,fg,dC,fg,1I,fg,2I,…,fg,dI],fg,hp=∑z∈Wguz,hp,p=C,I;h=1,…,d,
where Wg indicates the position set of the region centered at the location g. uz,hp is the membership value of the pixel located at the position z to the hth coherence (incoherence) clustering center. By comparing the feature vectors of the same pixel location in the consecutive frames, we can detect the moving objects and update the background model.
3. Background Subtraction Methods
The background model BF is initialized with the feature vector FCCV of each pixel of the first frame of the image sequence. For the current frame, we compute the FCCV for every pixel and compare it with the FCCV of the corresponding position of the background model by using some similarity measure. Then the pixel is determined as background (Og=0) or foreground (Og=1) objects as follows:
(4)Og={0,D(Fg,BFg)>τ,1,otherwise,
where D(·,·) is a similarity measure defined as
(5)D(Fg,BFg)=∑h=1dmin(fg,hC,bg,hC)max(∑h=1dfg,hC,∑h=1dbg,hC)*ε+∑h=1dmin(fg,hI,bg,hI)max(∑h=1dfg,hI,∑h=1dbg,hI)*(1-ε),
where ε is an empirical parameter to integrate the incoherent and coherent features to compare the similarity of the two descriptions. That is to say, when the higher similarity exists between a pixel of the current frame and the background model, this pixel is implied to be a background. Otherwise, the lower similarity suggests that the pixel should belong to the moving objects.
In the dynamic background, some change will take place in the background as time goes on. In order to take these changes into account and build a sound background model, the model should be online updated with the detection results of the current frame. The scheme for the gth pixel is adopted as the following method [16]:
(6)BFg(i)={(1-λ)BFg(i-1)+λFg(i),ifOg=0,BFg(i-1),ifOg=1,
where λ is the updating rate belonging to the coverage [0,1]. The updated background model BF(i) is used to detect the objects of the (i+1)th frame.
The background subtraction algorithm is shown in Figure 2 and summarized as follows.
Flowchart of the proposed algorithm.
(A) Initialization Phase (for the first video frame, i=1)
Convert RGB into HSV color space and quantize it into two gray images LHSV and LV with m=128 color bins.
Calculate the color coherence vectors of LHSV and LV and then apply FCM clustering method to the combined coherence subvector and incoherence subvector to obtain membership matrices UC and UI, respectively.
Derive the FCCV for each pixel using (3).
Initialize the background model BF(1) (i=1) with the FCCV calculated in step (c).
(B) Detection Phase (for the ith frame, i=2,…,T)
Convert RGB into HSV color space and quantize it into two grey images as the initialization phase.
Calculate all feature vectors (FCCVs) for the ith frame using (3).
Detect the moving object by masking each pixel with “1” and “0” using (4) and the background model BF(i-1).
Update the background model using (6).
4. Results and Discussions
In this section, we conduct the experiments on several benchmark image sequences. In order to fully demonstrate the detection performance, the results of the proposed algorithm are compared to typical background subtraction methods including frame difference (FD) [17], MoG [3, 17], FCH [12], FG [10, 17], and FASOM [11, 17]. The numbers of the quantized color bins and clustering centers are chosen as m=128 and d=32, respectively. The updating weighting rate λ is set to be 0.01. The size of the local window for extracting FCCV is 5 × 5 pixels. In order to be fair, no postprocessing is used to improve the background subtraction performance.
The benchmark image sequences include FT, WS, CT and CA [18], WT [19], CJ [20], CE, and FT2 [21]. The frame numbers of original image sequence and ground truths are listed in Table 1. The selected original video frames, ground truths, and the results (foreground masks) of these methods are shown in Figures 3 and 4. It can be seen that the proposed method provides a reliable background model so that the dynamic background, such as waving leaves and turbulent water, is almost separated from the moving objects. The detection results suggest that the introduced method could effectively find the moving objects and it is better than the existing methods.
The foreground detection could be considered as binary classification of each pixel. With the ground truth, we can use the measures, recall and precision, to explain the correctness of this classification. Thus, they could be used to evaluate the performance of background subtraction methods:
(7)recall=numberofforegroundpixelsiscorrectlyidentifiednumberofforegroundpixelsinthegroundtruth,precision=numberofforegroundpixelsiscorrectlyidentifiednumberofforegroundpixelsisdetected.
For the quantitative evaluation of our algorithm, we utilize a compositive measure, F-measure [22],
(8)F-measure=2recall·precisionrecall+precision,
to evaluate the effectiveness of our algorithm quantitatively. Higher values indicate better performance of the algorithm.
The F-measures of background subtraction algorithms for different image sequences are listed in Table 2. It is shown from the quantitative analysis that the proposed FCCV provides higher values on the average, which indicates better performance of the introduced method.
F-measure.
FT
WT
WS
CT
CA
CJ
CE
FT2
FD
0.3726
0.7257
0.4797
0.6719
0.1133
0.5481
0.3186
0.1632
MoG
0.2160
0.5497
0.1582
0.3844
0.2712
0.5629
0.1952
0.3242
FCH
0.6341
0.9617
0.9050
0.7553
0.6466
0.4915
0.7989
0.6231
FG
0.5014
0.7492
0.6735
0.8460
0.1495
0.8239
0.3148
0.2753
FASOM
0.7849
0.9289
0.8702
0.8771
0.1712
0.8138
0.5679
0.5625
FCCV
0.7458
0.9839
0.8826
0.8198
0.7575
0.9131
0.8188
0.6866
In our experiments, the foreground pixel is determined by using (4) with the empirical parameter τ. In order to further evaluate the proposed algorithm, we also analyze the detection performance with different values of the parameter τ. The results for the image sequences WT, WS, and CA are shown in Figure 5. It can be seen that the detection performance has acceptable change when the parameter τ lies in the interval [0.4,0.5], from which we can choose a proper parameter value for detecting the moving objects.
Detection performance with different values of the parameter τ.
5. Conclusions
The detection of moving objects in video frames occupies a critical role in lots of applications. The traditional color histogram provides important color distribution information for moving object detection. However, there is a lack of the local spatial relationship information of colors, which could be captured by CCV. Therefore, this paper proposes a background subtraction algorithm based on FCCV using fuzzy c-mean clustering and CCV, in which the introduced feature vector FCCV for each pixel is used to build background model and detect the moving objects. In addition, the fuzzy property of the FCCV makes the algorithm robust to varying changes of environment, so the method could detect the moving objects in the dynamic background. The experimental results on some benchmark datasets demonstrate that the approach achieves better detection performance than several existing methods.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
Acknowledgments
This work is partially supported by the National Natural Science Foundation of China 61371175, Postdoctoral Science-Research Developmental Foundation of Heilongjiang Province LBH-Q09128, and Fundamental Research Funds for the Central Universities HEUCFQ1411. The authors also gratefully acknowledge the helpful comments and suggestions of the reviewers, which have improved the presentation.
BugeauA.PérezP.Detection and segmentation of moving objects in complex scenes2009113445947610.1016/j.cviu.2008.11.0052-s2.0-61349170584HeikkiläM.PietikäinenM.A texture-based method for modeling the background and detecting moving objects200628465766210.1109/TPAMI.2006.682-s2.0-33144466752StaufferC.GrimsonW. E. L.Adaptive background mixture models for real-time trackingProceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '99)June 1999Ft. Collins, Colo, USA2462522-s2.0-0032634283FriedmanN.RussellS.Image segmentation in video sequences: a probabilistic approachProceedings of the 13th Conference on Uncertainty in Artificial Intelligence1997Rhode Island, RI, USA175181ElgammalA.HarwoodD.DavisL.Non-parametric model for background subtractionProceedings of the European Conference on Computer VisionJune 2000Dublin, Ireland751767ChengL.GongM.SchuurmansD.CaelliT.Real-time discriminative background subtraction20112051401141410.1109/TIP.2010.20877642-s2.0-79955364171MasonM.DuricZ.Using histograms to detect and track objects in color videoProceedings of the 30th Applied Imagery Pattern Recognition WorkshopOctober 2001Washington, DC, USA154159ZhangS.YaoH.LiuS.Dynamic background modeling and subtraction using spatio-temporal local binary patternsProceedings of the IEEE International Conference on Image Processing (ICIP '08)October 2008San Diego, Calif, USA1556155910.1109/ICIP.2008.47120652-s2.0-69949189641ZhangS.YaoH.LiuS.ChenX.GaoW.A covariance-based method for dynamic background subtractionProceedings of the 19th International Conference on Pattern Recognition (ICPR '08)December 2008Tampa, Fla, USA142-s2.0-77957943579SigariM. H.MozayaniN.PourrezaH. R.Fuzzy running average and fuzzy background subtraction: concepts and application200882138143MaddalenaL.PetrosinoA.A fuzzy spatial coherence-based approach to background/foreground separation for moving object detection201019217918610.1007/s00521-009-0285-82-s2.0-67449103355KimW.KimC.Background subtraction for dynamic texture scenes using fuzzy color histograms201219312713010.1109/LSP.2011.21826482-s2.0-84862968684PassG.ZabihR.Histogram refinement for content-based image retrieval10Proceedings of the 3rd IEEE Workshop on Applications of Computer Vision (WACV '96)December 1996961022-s2.0-0030384099NiranjananS.GopalanS. P. R.Performance efficiency of quantization using HSV colour space and intersection distance in CBIR201242214855HuangC.WangG.Method of image retrieval based on color coherence vector20063221941992-s2.0-33644636485ChiranjeeviP.SenguptaS.New fuzzy texture features for robust detection of moving objects2012191060360610.1109/LSP.2012.22053802-s2.0-84864590928SobralA.BGSLibrary: a background subtraction libraryProceedings of the 9th Workshop de Visão Computacional (WVC '13)2013Rio de Janeiro, BrazilLiL. Y.HuangW. M.GuI. Y.-H.TianQ.Statistical modeling of complex backgrounds for foreground object detection200413111459147210.1109/TIP.2004.8361692-s2.0-7444243389ToyamaK.KrummJ.BrumittB.MeyersB.Wallflower: principles and practice of background maintenanceProceedings of the 7th IEEE International Conference on Computer Vision (ICCV '99)September 1999Kerkyra, Greece2552612-s2.0-0033285765SheikhY.ShahM.Bayesian modeling of dynamic scenes for object detection200527111778179210.1109/TPAMI.2005.2132-s2.0-28044439637GoyetteN.JodoinP. M.PorikliF.KonradJ.IshwarP.Changedetection.net: a new change detection benchmark datasetProceedings of the IEEE Workshop on Change Detection (CDW '12)201218BrutzerS.HöferlinB.HeidemannG.Evaluation of background subtraction techniques for video surveillanceProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR '11)June 2011Colorado Springs, Colo, USA1937194410.1109/CVPR.2011.59955082-s2.0-80052908305