A Multi-Model Stereo Similarity Function Based on Monogenic Signal Analysis in Poisson Scale Space

A stereo similarity function based on local multi-model monogenic image feature descriptors LMFD is proposed to match interest points and estimate disparity map for stereo images. Local multi-model monogenic image features include local orientation and instantaneous phase of the gray monogenic signal, local color phase of the color monogenic signal, and local mean colors in the multiscale color monogenic signal framework. The gray monogenic signal, which is the extension of analytic signal to gray level image using Dirac operator and Laplace equation, consists of local amplitude, local orientation, and instantaneous phase of 2D image signal. The color monogenic signal is the extension of monogenic signal to color image based on Clifford algebras. The local color phase can be estimated by computing geometric product between the color monogenic signal and a unit reference vector in RGB color space. Experiment results on the synthetic and natural stereo images show the performance of the proposed approach.


Introduction
Shape and motion estimation from stereo images has been one of the core challenges in computer vision for decades.The robust and accurate computation of stereo depth is an important problem for many visual tasks such as machine vision, virtual reality, robot navigation, simultaneous localization and mapping, depth measurements, and 3D environment reconstruction.Most of conventional approaches, such as intensity-based or correlation-based matching, feature-based matching, and matching function optimization techniques, estimate the disparity only based on local intensity and feature between stereo images so that the results may be susceptible to level shift, scaling, rotation, and noise 1, 2 .
To overcome these drawbacks, we propose a new method for establishing spatial correspondences between a pair of color images.Unlike classical stereo-matching method based on brightness constancy assumption and phase congruency constraint, we match feature points and estimate disparity map between stereo images based on a new local multimodal monogenic image feature descriptors in the Color Monogenic Signal framework 3, 4 .We firstly introduce the monogenic signal 5 of 2D gray level image using Dirac operator and Laplace equation and extract local amplitude, local orientation, and instantaneous phase information in multiscale space 6 .At the same time, the 2D monogenic signal is extended to color image, and the color monogenic signal is introduced based on Clifford algebras.The local color phase is estimated by computing geometric product between the color monogenic signal and a unit reference vector in RGB color space with Clifford algebras 3, 4 .Then we focus on defining new local multi-model monogenic image feature descriptors which contain local geometric local orientation , structure instantaneous phase , and color local color phase and color values information in the Color Monogenic Signal framework.Based on the proposed image feature descriptors, a stereo similarity function between two primitives is also defined to solve stereo correspondence problem.Finally, we test the performance of the proposed approach on the synthetic and natural stereo images, and experiment results are given in detail.

Modeling 2D Image Signal
Based on the results of Fourier theory and functional analysis, we assume that each 2D signal f ∈ L 2 Ê ∩ L 1 Ê can be locally modeled by a superposition of arbitrarily orientated onedimensional cosine waves 6 : with * as the convolution operator and the orientation o ν x, y, s cos θ v x, y, s , sin θ v x, y, s T .Note that each cosine wave is determined with the same amplitude and phase information.The Poisson convolution kernel reads p s x, y s

2.2
For a certain scale space parameters s ∈ Ê , the Poisson kernel acts as a low pass filter on the original signal f.The Poisson scale space is naturally related to the generalized Hilbert transform by the Cauchy kernel.To filter a frequency interval of interest, the difference of Poisson DoP kernel will be used in practice: with s c > s f > 0 and s c as the coarse scale parameter and s f as the fine scale parameter.The filtered signal is defined by convolution with the difference of Poisson kernel which will be used to analyze the original with the DoP operator to consider only a small passband of the original signal spectrum.Without loss of generality the signal model in 2.1 degrades locally at the origin x, y 0, 0 of a local coordinate system to a x, y, s cos φ x, y, s .

2.4
In case of image analysis lines, edges, junctions, and corners can be models in this way.The signal processing task is now to determine the local amplitude a x, y, s , the local orientation θ v x, y, s , and the local phase φ x, y, s for a certain scale space parameter s and a certain location x, y .This problem has been already solved for one-dimensional signals by the classical analysis 7 by means of the Hilbert transform 8 and for intrinsically one-dimensional signals 9 by the two-dimensional monogenic signal by means of the generalized first-order Hilbert transforms 10 .

The Analytic Signal
Let s: Ê → Ê be a real-valued signal, and let f: Ê → Ê 2,0 3 be a vector-valued signal such that f x s x e 2 4 .The purpose is to construct a function fulfilling the Dirac equations whose real part is real-valued signal.It is equivalent to find the solution of a boundary value problem of the second kind a Neumann problem : The first equation in 2.5 is the 2D Laplace equation restricted to the open domain y > 0. The second equation is called the boundary condition and the basis vector.e 2 is coherent with the embedding of complex functions as fields the real part is embedded as the e 2 -component .Using the fundamental solution of the 2D Laplace equation, the solution of the problem leads to where p 1 y/π x 2 y 2 is the 1D-Poisson kernel and h 1 1/πx e 12 is the Hilbert kernel.The variable y is a scale parameter, and, taking it equal to zero, the classical analytic signal can be obtained.

Instantaneous Phase of the Monogenic Signal
Following the previous construction of the analytic signal, Michael Felsberg and Gerald Sommer has proposed an extension to 2D signals and defined a monogenic signal which is the combination of a gray image with its Riesz transform 5 .Let s : Ê 2 → Ê be a real-valued signal and f : Ê 2 → Ê 3,0 a vector-valued signal, and {e i } i 1, 2, 3 is the orthonormal basis of Ê 3 such that f x, y f 3 x, y e 3 .According to the 3D Laplace equation restricted to the open half-space z > 0 and the boundary condition of the second kind, we can obtain the monogenic signal as follows: where a x, y, s , θ x, y, s , and φ x, y, s represent the local amplitude, local orientation, and instantaneous phase, respectively:

2.9
However, we do not know the correct signal of the phase since it depends on the directional sense of θ x, y, s .The best possible solution is to project it onto cos θ, sin θ as φ x, y, s f q x, y, s f q x, y, s arg f p x, y, s i f q x, y, s .2.10

Local Color Phase of the Color Monogenic Signal
In 2009, Demarcq et al. constructed a scale-space signal for color images seen as vectors in Ê 5,0 4 .Let s : Ê 2 → Ê 3 be a real-valued signal and f : Ê 2 → Ê 5,0 a vector-valued signal, Then a color image is decomposed in the RGB space represented as the subspace spanned by {e 3 , e 4 , e 5 }.Now we need to find a function which is monogenic and the e 3 -, e 4 -and e 5 -component, of which are the components r, g, and b, respectively.

2.12
Each solution of the system in 2.12 leads to monogenic functions S 1 , S 2 , S 3 : they satisfy the Dirac equation in each subspace E i span{e 1 , e 2 , e i } i 3, 4, 5 and consequently the Dirac equation in Ê 5,0 DS i 0 .Let f C S 1 S 2 S 3 , then f C is still monogenic in Ê 5,0 i.e., Df C 0 and satisfies the boundary conditions.So the scale-space color monogenic signal can be obtained by using the Dirac operator and the Laplace equation as follows: where i 3, 4, 5 is a 2D Poisson kernel and h R h x , h y is the Riesz kernel.As for the analytic or monogenic signal, a color image f can be represented in terms of local amplitude and local phase.Now our proposal is to use the geometric product in order to compare two vectors in Ê 5,0 .
In the Clifford algebra of the Euclidean vector space Ê n , the product of two vectors a and b, embedded in Ê n,0 , is given by 3, 4 where a • b is the inner product and a ∧ b, the wedge product of a and b, is a bivector.This product is usually called the geometric product of a and b.The inner product is symmetric, and the wedge product is skew symmetric.If V ue 1 ve 2 ae 3 be 4 ce 5 ∈ R 5,0 is a chosen vector containing structure information u, v and color information r, g, b , then the geometric product of f C and V can be given by

2.15
where • 0 denotes the scalar part, • 2 the bivector part, and | • | the magnitude of the bivector part 6 .According to Clifford algebras, the geometric product reveals the relationship between bivectors and complex numbers 3 .This means that we can form the equivalent of a complex number, , by combining a scalar and a unit bivector.The local color phase can be computed as follows: This phase describes the angular distance between f C and a given vector V in R 5,0 , that is, it gives a correlation measure between a pixel fitted with color and structure information and a vector containing chosen color and structure.

Local Multi-Model Monogenic Image Feature Descriptors
We make use of a visual representation 11 which is motivated by feature processing in the human visual system and define new local multi-model monogenic image descriptors which give an explicit and condensed representation of the local image signal as follows: In fact, this representation performs a considerable condensation of information in a local image patch of n × n n ∈ Ê , n > 1 pixels.The symbol X x, y represents central coordinates of the local image patch, φ is the instantaneous phase of the gray monogenic signal, ϕ c is the local color phase, and C r, g, b is the color values in RGB color space.Based on the local multi-model image feature descriptors, a stereo similarity function between two primitives is the weighted sum of squared differences of instantaneous phase, local color phase, and color vector for a pair of stereo images in the local patch: where l, r represent the left and right images, respectively, d φ ∈ 0, π is the distance measurement of instantaneous phase φ ∈ −π, π , d ϕ ∈ 0, π is the distance measurement of local color phase ϕ c ∈ 0, π , and d c ∈ 0, √ 3 is the distance measurement of color vector with C ∈ 0, 1 × 0, 1 × 0, 1 and c R c G c B 1 in RGB color space.The symbols α, β, γ are weighted coefficients with α, β ∈ 0, 1 and γ ∈ 0, 0.5 .In order to achieve a better coherence with the real scene, we use an adaptive support-weight technique in the local patch 12 .The support weight for each pixel in the window is calculated based on the Gestalt Principles, which state that the grouping of pixels should be based on spatial proximity and chromatic similarity.The original formula proposed is given as follows 13 : is the Euclidean distance between pixel x, y and x m, y n , γ c and γ g are user defined parameters and, E x l , y l , x r , y r is the aggregated cost between pixel x l , y l in the left image and x r , y r in the right image.
In the proposed feature descriptors, there are several merits.Firstly, the instantaneous phase contains local orientation or geometric information and information about contrast transition so that it describes an intrinsically one-dimensional structure in a grey level image, that is, an image structure that is dominated by one orientation.Examples of different contrast transitions are a dark/bright bright/dark edge or a bright dark line on dark bright background.Of course, there is a continuum between these different grey level structures.The instantaneous phase as an additional feature allows us to take this information into account as one parameter in addition to orientation in a compact way 11 .Secondly, local color phase describes the angular distance between color monogenic signal f c of color image and a given color vector V in R 5,0 and gives a correlation measure between a pixel fitted with color and structure information and a vector containing chosen color and structure 14 .Finally, the color vector indicates the mean color structure of local image because color is also an important cue to improve stereo matching.Because the stereo similarity function with the local multi-model feature descriptors contains local geometric, structure, and color information, it is much more robust against noise and brightness change than others in feature matching and 3D reconstruction.

Experimentation Results
Once the similarity function is given, minimization process can be performed to find the optimal disparity.In order to reduce noise sensitivity and simultaneously achieve higher efficiency, the multiscale space and a winner-take-all technique are employed to optimize the disparity map for stereo matching.We test the performance of the proposed local multimodel feature descriptors and similarity function on the synthetic and natural stereo images.On the first step, we estimate and compare the disparity maps of a pair of synthetic images using each of three distance measurement and the proposed stereo similarity function.On the second step, we reconstruct 3D shape and appearance of natural object and scene combing the proposed stereo similarity function and multiview stereo technique 15 .

Disparity Estimation Experiment
In the first experiment, we chose a pair of color images cloth1 from the website 16 ; the left image, right image, and ground disparity map of the stereo pair are showed in Figures 1 a -1 c , respectively.Firstly, the left and right color images are converted to gray level image.The gray monogenic signals with s 3 scales are computed, and the instantaneous phases are estimated for both gray images in Poisson scale space.Figure 2 show three instantaneous phase maps of the left image.At the same time, the scale-space color monogenic signals with s 3 scales are computed, and a chosen reference vector V e 3 e 4 e 5 / √ 3 with local geometric structure u, v 0, 0 and a unit vector a, b, c 1, 1, 1 / √ 3 in RGB color space are used to estimate the local color phase for both images 14 .Figure 3 also shows three color phase maps of the left image.
Secondly, the proposed stereo similarity function based on local multi-model monogenic image descriptors is employed to compute the weighted sum of squared differences of instantaneous phase, local color phase, and color vector.The scale-space color monogenic framework 14 and a local optimization technique 17 are employed to optimize the disparity map according to the adaptive support-weight cost aggregation as in 3.4 .In the scale space, the disparity field at the coarser scale is used as starting guesses for estimation at the next finer scale, then a subpixel disparity can be obtained.For example, we estimate the disparity map at the third scale using the cost function 3.2 and 3.4 .The disparities at the third scale directly subtending the second scale under scrutiny are all used as candidate starting guesses.The candidate leading to best match is accepted for the regularization step.Estimation proceeds in this fashion, decrementing scale from coarser to finer, with matching following by regularization, until the finest level of detail is reached.Figure 4 shows the dense disparity map based on the proposed algorithm.Figures 4 a -

Algorithms
The selected algorithms The proposed algorithms where d E and d G are the estimated and ground truth disparity maps and N is the total number of pixels in an image, whereas ζ represents the disparity error tolerance.The statistics RMSE and PBD related to all of the above algorithms is presented in Table 1.As it can be seen in Table 1, the proposed algorithms with the weighted coefficients α 1, β 1, γ 0.5 obtain better results RMSE 1.82 and PDB 0.09 than others.

The Robustness against Noise and Brightness
Now we aim to investigate the robustness of the proposed approach against noise and brightness change.In a first step, Gaussian noises with σ 0.05, 0.10, 0.15, 0.20, 0.25, 0.30 are added to both color images, respectively, and the disparity maps are estimated using the proposed method with four different weighted coefficients.The statistic root mean square error RMSE and percentage of bad disparities PBD for each disparity map with noise also are calculated using 4.1 .In a second step, the brightness values with I −30, −20, −10, 10, 20, 30 are added to the right image.We also compute the disparity maps, RMSE and PBD, with four different weighted coefficients.The experiment results in Table 2 show that the proposed approach with 1,β 1, γ 0.5 are insusceptible to noise and brightness change.

3D Reconstruction Experimentation
In the last experiment, the proposed approach is used to reconstruct a set of oriented points for 3D natural scenes by multiview stereopsis.We firstly captured stereo images with the size of 1024 × 768 around 3D object and scene a static color paper cup and a deforming color texture paper under the condition of natural light by two color cameras AVT Guppy F146C .We also calibrate the intrinsic and external parameters for all of cameras and images and estimated the corresponding epipolar lines among images.Feature points for all images are detected by using Speeded-Up Robust Features SURF with blob response threshold of 1000 21 and are matched by using the corresponding epipolar constraints and the proposed similarity function with α 1, β 1, γ 0.5 in 3.2 and 3.4 .According to these init matched feature points, we reconstruct a set of 3D sparse oriented points for object and scene similar to multiview stereopsis in 15 .Then we expand and filter it to a set of robust 3D dense oriented points with each image cell of 2 × 2 pixels and reconstruct.We reconstruct three-dimensional surface of the object.For the static color paper cup, we capture 16 images and reconstruct the 3D dense oriented points and shape.In Figure 5, there are the captured images, the 3D dense oriented points, and three-dimensional surface at the 1st, 7th, and 13th view, respectively.For the deforming color texture paper, we capture the stereo sequence using two cameras at the rate of 7 frames each second and reconstruct the time-varying 3D shape and motion.Figure 6 shows the captured image of one camera and the time-varying 3D shape at the 0th, 20th, 40th, 80th, and 100th frames.
To further validate the performance of the proposed method, a new set of 3D dense patches for the static color paper cup also is reconstructed by the SSD-based photometric discrepancy function in 15 .And, for both algorithms, we calculated the mean feature points MFP and percentage of bad disparities PBD of each image, the number of 3D oriented points at the init matching MAT 3D , expanding EXP 3D and filtering FIL 3D stages, and the percentage of image cells not to be reconstructed PNR .Statistical results are shown in Table 3.

Conclusions
In the paper, we propose a stereo similarity function based on local multi-model monogenic image feature descriptors to solve stereo-matching problem.Local multi-model monogenic   monogenic signal is the extension of monogenic signal to color image based on Clifford algebras.The local color phase can be estimated by computing geometric product between the color monogenic signal and a unit reference vector in RGB color space.Experiment results on the synthetic and natural stereo images show the performance of the proposed approach.But there is a shortcoming that the proposed method needs much more run time and storage space.So we will be devoted to improving the efficiency and evaluating the proposed method in the future work.

Figure 1 :Figure 2 :
Figure 1: Stereo pair of Cloth1.a Left image, b right image, and c ground disparity map.

Figure 3 :Figure 4 :
Figure 3: Local-color-phase maps.a First scale, b second scale, and c third scale.
4 d show four estimated disparity maps with the different weighted coefficients α, β, γ .Thirdly, a comparison is performed to further validate the claims about the performance of the proposed algorithm.This comparison is performed between the proposed

Figure 5 :
Figure 5: The 3D dense oriented points reconstruction for the color paper cup. a -c The captured images, d -f the 3D dense oriented points, and g -i the three-dimensional surface at the 1st, 7th, and 13th view.

Figure 6 :
Figure 6: The time-varying shape and motion estimation for the color texture paper.a -f The captured images and g -l the three-dimensional surface at the 0th, 20th, 40th, 60th, 80th, and 100th frames.
. In case of intrinsic dimension one signals The intrinsic dimension expresses the number of degrees of freedom necessary to describe local structure.Constant signals without any structure are of intrinsic dimension zero i0D , arbitrary orientated straight lines and edges are of intrinsic dimension one i1D , and all other possible patterns such as corners and junctions are of intrinsic dimension two i2D .In general i2D signals can only be modeled by an infinite number of superimposed i1D signals.i.e., n 1 in 2.1 , we can obtain: ∂/∂x i and Δ D2.A scale-space signal which has independent scales in each component f 3 , f 4 , f 5 r, g, b can be defined by splitting the problem into three boundary value problems in Ê 5,0 as follows: i

Table 1 :
The comparison of the estimated disparity with the different algorithms.
number of selected algorithms from the literature, which are Sum of Absolute Differences SAD 18 , Sum of Squared Differences SSD 18 , Graph Cuts GC 18 , Discrete Wavelet DWT 19 , Complex Discrete Wavelet CWT 19 , and Quaternion Wavelet QWT 20 .To create a better understanding of the comparison, statistic root mean square error RMSE and percent-age of bad disparities PBD are calculated as 19

Table 3 :
Experiment results for matching features and reconstructed 3D oriented points.features include local orientation and instantaneous phase of the gray monogenic signal, local color phase of the color monogenic signal, and local mean colors in the multiscale color monogenic signal framework.The gray monogenic signal, which is the extension of analytic signal to gray level image using Dirac operator and Laplace equation, consists of local amplitude, local orientation, and instantaneous phase of 2D image signal.The color image