Temporal Matching in Endoscopic Images for Remote-Controlled Robotic Surgery

Temporal matching is applied in the frame of the formation of high-level entities in remote-controlled robotic surgery. The objective is to track tumor boundaries over time to improve the segmentation stage in each image of the sequence to facilitate the tracking and localization of the tumor. It makes use of an attributed string matching technique to find the correspondence between tumor boundaries over time. Relationships are then exploited to reconstitute the tumor boundaries and remove the inconsistencies coming from the detection errors. Input data are free form shapes of different length representing the tumor boundary, extracted at a previous stage.


Introduction
With recent advances in telemedicine technology, it is now possible to perform closed-abdomen surgery on a moving organ of a remote patient using robotic instrument to minimize patient trauma and certain side effects. For robotic assisted surgery, dexterity is enhanced by microprocessor controlled mechanical wrists, which allow motion scaling for reducing gross hand movements and the performance of microscale tasks that are otherwise not possible. So far, two commercially available master-slave manipulator devices are specifically designed for robotic cardiac surgery [1]. Both systems improve the ergonomics of laparoscopic surgery and provide high dexterity, precision, and 3D visualization of the operating field. One of the significant challenges of the invivo surgery is the destabilization introduced by respiratory motion, thus severely affecting precise instrument-tissue interactions and the execution of complex grafts. The other one is the precise localization of the tumor boundary, while most of the automatic image-guided algorithms yield incomplete, discontinuous, imprecise detection results. And the two above challenges always trigger each other and form a chick-egg problem. Thus in this paper, we intend to solve this problem by a novel high-level entities formation method.
Many existing methods have been developed to extract structural features [2][3][4] but, at the same time, included some errors (incomplete detection leading to discontinuities, imprecise localization of contours) that increased the difficulties of the further processing like tracking or 3D reconstruction. One way to overcome these difficulties has been, these last years, to incorporate some interactivity [5]. The contour detection makes use of an interactive procedure for the first image, and is then used as initial model of a snake to extract the contours in the other images of the sequence. Thus in [5], for each model curve, control points were interactively defined as the anchor points of an active contour modeling. Initial control points were interactively selected and the 2D contours represented as a B-spline. The active contour model was then applied to fit each curves of the selected contours. In [5], the segmentation of the first image made use of an interactive tool based on the mouse moving. The operator selected an initial point inside the image, and uncoiled a thread with the mouse, roughly following the boundary. An optimal path research procedure based on a dynamic programming algorithm and using a maximum multiscale response map for constraining the path search was performed between the initial point and the current position of the mouse. The contour was thus piecewise built, by successively moving the mouse along the boundary. We focus here on the intermediate level between the segmentation and the reconstruction stages. We look for improving the automatic segmentation result in each image of the sequence by reconstituting high level entities, that is, the shape of tumors, in order to facilitate the tracking and localization of the tumor. A first intraimage analysis was previously performed to detect contours in each image of the sequence. It aimed at forming correspondences from previously extracted contours by using a string matching process [6]. The present work represents a continuation of the previous one. It aims at tracking boundaries over time to establish correspondences in the sequence. These correspondences will then be used to spatially prolong the segments in order to reconstitute the boundaries and get a symbolic description of the structures in the 2D space. We present in this paper the temporal matching method, which is based on an attributed string matching technique. The outline of this paper is as follows. Section 2 describes the temporal procedure. Section 3 provides results on real data. Some prospects for future work are then proposed in conclusion.

Temporal Matching
Attributed string matching techniques have been broadly applied in pattern recognition with the aim to look for specific shapes, comparing unknown figures occurring in the input image with a set of prototypic models in a classification purpose or emphasizing "landmarks" subsets common to a set of features. More recently, interest was found in image analysis in particular in shape recognition [7][8][9] and registration in the frame, for instance, of the 3D reconstruction from stereo views [10].

String Matching Concepts.
String matching techniques allow measuring the similarity degree between two strings of symbols.
Let S n ∈ Σ ln be an arbitrary string of length l n = |S n | defined on a discrete alphabet Σ. We denote s i n the ith symbol of S n for i ∈ {1 · · · |S n |} ∈ such as S ln n = s 0 n s 1 n · · · s i n · · · s ln−1 n . An edit distance, based on edit operations (insertion, deletion, and substitution of symbols), is then defined as the minimum cost operation sequence over all possible with o k equals to one of the three above cited edit operations. It makes use of a cost matrix, which specifies the costs γ(o k ) of individual elementary operations for all combination of symbols. The total cost of the transformation is then calculated by summing up the elementary costs applied to each operation in the sequence. The distance associated with the minimum cost operation sequence is given by the relation: Its computation can be carried out using optimization techniques. Among these ones, the classical dynamic programming, initially designed by Wagner & Fisher [6], has largely been used to calculate the edit distance.
The process goes through the edit matrix computation D(i, j), 1 ≤ i ≤ l n , 1 ≤ j ≤ l p using the following recursive relation: where λ denote the empty symbol and γ(s i n → s

Shape Representation and Cost Function.
Let L t,n and L t+d,m two lines to be matched with (t, t + d) characterizing the image number in the sequence and (n, m) the line numbers in each image. Elements of L t,n and L t+d,m were, respectively, noted p i t,n and p j t+d,m , i and j being the element indexes for each line. Four attributes were associated to each pixel p i t,n : the intensity I(p i t,n ), the Euclidean location d i t,n , the tangent t i t,n , and the local curvature κ i t,n . Each line was thus expressed as a sequence S ln t,n of symbols (l n being the length of the string), each symbol corresponding to a set of attributes, we formalized as a vector s i t,n = (I(p i t,n ), d i t,n , t i t,n , κ i t,n ). We designed a multiparametric cost function between sets of two vectors (s i t,n , s j t+d,m ) based on perceptual criteria: parallelism, proximity, curvature, and intensity similarity.
Deletion and insertion costs were set to 0.5, while the substitution cost, defined by the function f (s i t,n , s j t+d,m ), was expressed on the [0, 1] interval: where α, β, λ, γ were weighting coefficients such that α + β + λ + γ = 1. Each function was defined as an exponential of the form f (, ) = e −((Δ max −|Δ|)/|Δ|) , where Δ max characterized the maximum permitted deviation between two symbols for a given criterion and Δ, the difference measured between the two considered symbols. Δ max definition depends on the limit, we set up for each parameter deviation so that 0 ≤ f (, ) ≤ 1, value 0 depicting a good shape similarity.

Tumor Boundary Formation.
This stage aimed at establishing the temporal correspondences between segments of contours along the sequence. The attributed string matching was applied on sets of pairewise images, all possible temporal associations being taken into consideration.
For each contours L t,n in image I t , a contours neighborhood V (t,t+d),n was built in image I t+d such as where Δd max sets the upper limit for the contour displacement between two instants t and t + d and N n denotes the cardinal of V (t,t+d),n . All the possible combinations of contours, inside this neighborhood, were then considered for the pairing. With the segment running direction for the matching being not prior known, each extremity of a segment was numbered and two runs were considered according to these extremities. Two associations were then considered among the four possible combinations {(L 1 t,n , L 1 t+d,m ), (L 1 t,n , L 2 t+d,m )}, exponent 1 and 2 representing the extremity number. Attributed strings (S ln t,n , S lm t+d,m ) were then assigned to each two lines (L t,n , L t+d,m ). The attributed string matching process was applied for each two pairs of lines and the right running direction corresponded to the best association among the two, when choosing the minimal edit distance as selection criterion.
Because multiple assignments can occur in some situations (when several contours are close to each other or when spurious segments are present), two criteria were thus defined in order to keep only the one-to-one matching (1) A threshold on the normalized edit distance aimed to remove all the inconsistent associations: matched lines whose edit distance was greater than this threshold were cancelled. (2) An overlapping criterion was then considered to remove redundant associations. These associations are considered as right if there is no overlapping between the matched sets. On the contrary, these associations are wrong and have to be filtered. Thus when several candidate curves were associated to the same curve with some overlapping between these associations, the matched lines were sorted out on the edit distance value and the established trace between the paired symbols. Considering for each line L t,n , the matched line list by decreasing edit distance values, if the trace between the matched pixels intersected a curve, the pair of lines having the highest edit distance value was removed.

Result and Discussion
The matching procedure was applied to endoscopic images acquired from standard optical colonoscope (from our collaborator: Stanford University), and the preliminary results can be found in Figure 1. Whatever the detection operator and the decision rule, it must be accepted that no exact result can be derived. This statement, which applies to edge detection too, was the first motivation of this work. In a first stage, edges were grouped using spatial properties to construct coherent pieces [6]. This process was operated in an intraimage mode. The next stage, that is, the temporal matching, takes advantage of the motion correspondence to solve spatial ambiguities and improve the segmentation results in presence of occlusion or discontinuities on the images. The matching process was led over time along the sequence. It was pairewise applied on the contour and allows at the same time to get a first approximation of the motion between the structures. The parameters setting resulted from a learning stage, which allowed studying the behavior of the algorithm according to the parameters choice and combination. This study showed that the results were not sensitive to the choice of the parameters. The best result was obtained with the following setting: α = 20%, β = 15%, 4 International Journal of Telemedicine and Applications λ = 30%, γ = 35%, ΔI max = 20 (maximum intensity difference), Δθ max = π/4 (angular difference between the tangents), Δκ max = 0.5 (maximum curvature with |Δκ| = |κ i t,n − κ j t+d,m |/8), Δd max = 30 (maximum distance). The distance parameter role is to penalize the structure displacements beyond this value. For these parameters, we computed the mean cross-correlation, the mean distance, and the mean normalized edit distance over the set of matched lines. We obtained the respective values: 0.963, 3.121, and 0.275. The branches formation was then performed over the sequence, based on the matching process results.

Conclusion
In this paper, we made use of an attributed string matching technique for the formation of high-level entities like boundaries of tumors in endoscopic images for remote-controlled robotic surgery. The advantage of the method is that few parameters have to be set to track the contours over time. The established temporal relationships were exploited to spatially locate the contours and reconstitute the tumor boundaries. These results will be further used to complete the tumor boundaries that were previously performed on each image of the sequence [6]. This work represents a contribution to the difficult problem of the tumor tracking and localization in robot-assisted minimal invasive surgery.