^{1, 2}

^{1}

^{1, 2}

^{1}

^{2}

This is a new approach to handle occlusions in stereovision algorithms in the multiview context using images destined for autostereoscopic displays. It takes advantage of information from all views and ensures the consistency of their disparity maps. We demonstrate its application in a correlation-based method and a graphcuts-based method. The latter uses a new energy, which merges both dissimilarities and occlusions evaluations. We discuss the results on real and virtual images.

Augmented reality has many applications in several domains such as games or medical training. On the other hand autostereoscopic display is an emergent technology, which adds a perception of depth enhancing the users immersion. Augmented reality can be applied to autostereoscopic display in a straightforward way by adding virtual objects on each image. However, it is much more interesting to use the depth-related information of the real scene so that virtual objects could be hidden by real ones.

To that end, we need to obtain one depth map for each view. The particular context of images destined to be viewed on autostereoscopic displays allows us to work on a simplified geometry (e.g., no rectification is needed, epipolar pairs are horizontal lines of the same rank, and disparity vectors are thus aligned along the abscissa). However, our aim is to obtain good assessment of depth in all kinds of scenes, without making any assumption on their contents. Indeed images may have homogeneous colors as well as they may have various colors. Also, due to the principle of autostereoscopic displays, the users can see two images at the same time. It is then crucial to have strongly consistent depth maps. For example if a virtual object is drawn in front of a real object in one view, it has to be drawn in the same order in all views. Therefore we introduce a new occlusion approach for multiview stereovision algorithms, which aims to ensure the consistency of the depth maps.

We propose an example of application of our approach in a correlation-based method and in a symmetrical graph-cuts-based method. Finally we discuss their results.

Stereovision algorithms aim to find the disparity maps in order to deduce the depth maps. That is the reason why we will use the phrase “disparity maps” instead of “depth maps” in the following lines. Depth maps can be easily obtained from disparity maps using a triangulation step.

Let us admit we have a set of

Illustration of the parallel camera configuration with four optics (illustrated by black dots).

Optical flow algorithms are based on a cost

The reader can refer to Scharstein and Szeliski [

A lot of algorithms deal with occlusions in order to obtain better disparity maps, which preserve discontinuity at object's boundaries. The first step to deal with occlusions is to be able to detect them. Egnal and Wildes [

it is extended to the multiview context,

it ensures a geometric consistency between depth maps.

After the detection step, the main difficulty is to handle occlusions in the matching algorithms. Woetzel and Koch [

There are two categories of methods based on energy minimization performing the matching while taking occlusions into account.

The first category contains iterative methods [

occlusions

In order to obtain better results, some methods start again at step 2 when step 3 is over and loop until the system converges. The problem of iterative methods is that disparities and occlusions estimations are independent, do not interact with each other and, thus, do not ensure a global geometric consistency.

The second category is then composed of methods to estimate occlusions and disparities simultaneously. In the context of two views, Alvarez et al. [

Note that even if pixels are detected as occluded, their dissimilarities are still taken into account in the dissimilarity term. That means that this term contains dissimilarities of mismatching pixels, which have nothing in common. This is a problem since that introduces noise into the energy. In order to solve that, Ince and Konrad [

By the same token, we have proposed a multiview graph-cuts-based method in [

In spite of the fact that these methods use smooth and discontinuity preserving functions, they still can contain inconsistencies that we will detail in Section

Let us imagine a standard scene with a man behind a wall. Four views of this scene are shot; Figure

Examples of matching graphs.

We propose the following rules in order to define our approach of occlusions. Let us assume

if

if

if

In order to simplify writing, we call

In order to take the rules presented above into account, we use an energy function of the form

In the case of Figure

Finally, this term is given by

This energy function can be used in different methods as we will see in Section

In this section, we present two applications of our occlusion approach. The first one does not use any smoothness constraint and focuses on our approach of occlusions in order to emphasize its relevance on a correlation-based method. The second one is an application of the energy function as defined in the previous section on a graph-cuts-based method.

Both methods use the same constant

This method uses two distinct local costs. The first one supposes there is no occlusion and the second one supposes there is exactly one occlusion. These two costs are in competition by means of a Winner Takes All (WTA) algorithm.

The first cost could be any local cost as found in the literature. Our implementation uses cost

The second cost

Example of the components of

Finally, the selection is based on a WTA algorithm: if the minimum cost for a pixel is obtained using

Our method is based on the energy function previously described in (

Now, we will see how to construct the graph corresponding to

Graph corresponding to

The smoothness of the result is ensured by term

To compare our methods, first between them and secondly with other existing ones, we use two sets of 8 images. The first one is a set of images of a virtual scene, which allows us to compare results against ground truth. The second one is a set of photographies taken at Palais du Tau in Reims [

A photography shot at Palais du Tau (a) and a virtual scene (b).

We compare three pairs of methods. The first pair is composed of correlation-based methods. One uses the cost of (

Using the ground truth of the virtual scene, we give the error rate corresponding to each method in Table

Errors with ground truth on the virtual scene plus computation times on both Palais du Tau and virtual sets.

Method | Occlusion | Error | Times (s) | |

Tau | Virtual | |||

Correlation | without | 0.644 | 0.23 | 0.25 |

with | 0.630 | 1.25 | 1.24 | |

1D smoothness | without | 0.718 | 6.88 | 6.02 |

with | 0.572 | 17.89 | 12.39 | |

2D smoothness | without | 0.692 | 12.02 | 8.98 |

with | 0.543 | 38.13 | 18.36 |

Table

Figure

Disparity maps obtained using methods based on: correlation (a),(b), horizontal smooth constraint (c),(d), and 2D smooth constraint (e),(f). Images (a),(c),(e) and (b),(d),(f) are, respectively, computed without and with our occlusion approach.

Extracts of Figures

On the other hand, graph-cuts based methods allow the symmetrical minimization of energy, ensuring a strong consistency. Figures

We have introduced a new approach in order to handle occlusions of a scene in a multiview context. As a proof of the relevance of this new detection rule, we have presented two methods with the particularity of handling objects boundaries very accurately. Even if these methods can handle two-view stereovision, they are designed for the multiview context with any number of views. The results we obtain show that our occlusion approach succeeds in detecting objects boundaries to the detriment of computation times, and can still create disparities even for pixels that are not visible in all views. Moreover, used on symmetrical energy minimization-based methods, our approach ensures a geometric consistency, which is crucial for autostereoscopic displays. However, computation time is the main problem of our methods. That is the reason why our objective is to find a means to minimize energy faster. One idea is the GPU implementation of the graph cuts. Some work has already been done in this domain [

The work presented in this paper was supported in part by the “Agence Nationale de la Recherche” as part of the CamRelief project. This project is a collaboration between the University of Reims Champagne-Ardenne and 3DTV Solutions. The authors wish to thank Didier Debons, Michel Frichet, and Florence Debons for their contribution to the project.