Camera Localization in Distributed Networks Using Trajectory Estimation

This paper presents an algorithm for camera localization using trajectory estimation (CLUTE) in a distributed network of nonoverlapping cameras. The algorithm recovers the extrinsic calibration parameters, namely, the relative position and orientation of the camera network on a common ground plane coordinate system. We first model the observed trajectories in each camera’s field of view using Kalman filtering, then we use this information to estimate the missing trajectory information in the unobserved areas by fusing the results of a forward and backward linear regression estimation from adjacent cameras. These estimated trajectories are then filtered and used to recover the relative position and orientation of the cameras by analyzing the estimated and observed exit and entry points of an object in each camera’s field of view. The final configuration of the network is established by considering one camera as a reference and by adjusting the remaining cameras with respect to this reference. We demonstrate the algorithm on both simulated and real data and compare the results with state-of-the-art approaches. The experimental results show that the proposed algorithm is more robust to noisy and missing data and in case of camera failure.


Introduction
Over the last few decades, ubiquitous use of large-scale camera networks has been ramping up in a wide range of applications such as visual surveillance of mass transportation sites, calamity watching, and traffic control.These camera networks essentially enable monitoring of extensively large areas and hence detection of interesting activities on a larger scale, which is impossible with the use of single cameras (see [1,2]).Existing activity detection systems primarily perform manual analysis of the data collected by such networks, which is an extremely tedious job; therefore, the development of automated data analysis and summarization tools are essential to accomplish maximum from these camera networks [3].
Calibration of camera networks is the first and foremost important step in the development of such an automated activity summarization system.The calibration defines the correspondence between points in the image plane and points in the 3D space and can be divided into intrinsic and extrinsic calibration (also labeled as localization).The intrinsic calibration establishes the relationship between camera-centric coordinates and images coordinates and can be performed by acquiring an object with known Euclidean structure, while extrinsic calibration defines relationship between scene-centric coordinate system and camera-centric coordinate system.For overlapping camera networks, the estimation of epipolar geometry is a popular choice for extrinsic calibration, where candidate corresponding points are initially extracted from the scene and then a model is learnt that minimizes the images and their reprojects [4].However, in many real scenarios, cameras do not have overlapping views (Figure 1).Examples of such scenarios are wide area surveillance of underground train stations and/or subways, and mobile ad-hoc networks of low-cost surveillance cameras.The large number of cameras makes it difficult or expensive to record cameras' locations manually or to equip each camera with a GPS unit [5].One way of addressing this issue is through the novel paradigm of automated nonoverlapping camera calibration algorithms, which enable cameras to determine their positions and orientations after the placement.In this paper, we present a camera localization algorithm using trajectory estimation (CLUTE) for a network of nonoverlapping cameras.The algorithm addresses the problem of recovering the relative position and orientation of multiple cameras whose intrinsic parameters are known.CLUTE uses temporal and geometric constraints derived from the available trajectory information to estimate the unobserved trajectory segments, which are then used to position the cameras on the common ground plane.The object motion information also helps in estimating the relative orientation of the cameras.The registration process aligns the cameras with respect to each other in a single camera frame (that of the reference camera).The algorithm is demonstrated on both simulated and real data, and its results are compared with state-of-the-art approaches.
This paper is organized as follows.Section 2 reviews state-of-the-art approaches for camera localization.Section 3 formalizes the problem under consideration.Section 4 provides detailed description of CLUTE.Section 5 demonstrates the results and analysis of CLUTE on real and simulated data and also compares its performance with two existing techniques.Lastly, Section 6 draws the conclusions.

Related Work
Localization is a well-established problem in sensor (camera, audio, RFID, etc.) networks ( [6][7][8]).Algorithms for localization can be categorized into two main classes, namely, fine-grained algorithms (see [9,10]) and coarsegrained algorithms ( [11,12]).Fine-grained methods use timing and/or signal strength for localization.In this class of localization methods, only a few sensors positions are known.These sensors are called beacons or anchors (see [13][14][15]).The knowledge of beacons is then propagated across the entire network to find the position of the remaining sensors.The nodes measure the distance to their neighbors whenever possible using hardware ranging techniques such as received signal strength (RSS) and time difference of arrival (TDoA).However, the selection of the anchor nodes is a significant problem and using all anchor nodes does not give the most precise position.Moreover, there is the need to identify criteria for selecting the optimal number of anchor nodes to achieve a more accurate position estimation.
Economic factors and hardware limitations are key motivations for the use of coarse-grained localization methods.These methods estimate the proximity of the sensors in a network to an arbitrary reference sensor.Coarse-grained algorithms can further be divided into nonstatistical and statistical approaches.The first nonstatistical approach is multidimensional scaling (MDS) for localization ( [16][17][18]).MDS arranges the sensors in a lower-dimensional space, whose size depends on the application data.MDS is very accurate in recovering the sensor network configuration when precise distances are available between all sensor pairs.To compute the locations of N cameras on a 2D space using MDS [19], an affinity matrix is constructed based upon pairwise Euclidean distances.Then, the inner product matrix I using the double centering matrix and the affinity matrix is calculated, followed by the computation of the eigenvalues and the eigenvectors of I.After sorting the eigenvalues in descending order, the sensors' locations are calculated using the top p eigenvalues and their corresponding vectors.The major shortcoming of this approach is its dependency on the affinity matrix and the unavailability of a few distances degrades the overall performance significantly.
Another nonstatistical approach is used to calibrate a network of randomly placed cameras with nonoverlapping fields of view using moving scene features in the near and far fields [20].A strong assumption is made that object motions are deterministic.Distant objects (e.g., stars) enable the recovery of the orientation (rotation) of the cameras, while close objects (e.g., people or cars) enable the recovery of the translation of the cameras up to a scalar multiple.In this approach, the camera parameters are recovered by solving a complex geometry problem, without imposing a probabilistic framework.
Statistical approaches include numerical, motion-based, maximum a posterior (MAP), and velocity extrapolation based approaches.Numerical solutions are iterative methods for network localization, and each iteration contributes to the reduction of the residual errors.Existing approaches are generally variations of the gradient descent or the Gauss-Newton methods.As an example, Taylor et al. [21] use the Newton Raphson method to estimate the network configuration.Although the simplicity of such approaches is a key advantage, they heavily depend on the proper initialization and increment (or decrement) rate to find the global (or local) minimum.
In structure from motion (SFM), the trajectory of a moving camera, and the 3D coordinates of a stationary target are recovered simultaneously from a series of 2D images of a scene.In [22], the focus is on real-time processing of the image data using an extended Kalman filter, whereas the concept of recursive or sequential SFM can be found in [23,24].Similar to SFM, simultaneous localization and mapping (SLAM) localizes a moving sensor (robot) and estimates its trajectory using its egomotion and the stationary objects in the scene (see [25,26]).The performance of SLAM algorithms is affected by noise, as the robots rely on their camera to compute the distance traveled and therefore noisy

Classes
Approaches References

Fine-grained localization
Maximum likelihood, trigonometric, [13,14,30], Received signal strength, time difference of arrival [9,11] Coarse-grained (nonstatistical localization) Multidimensional scaling [16][17][18][19] Moving scene features [20] Active badge location [12] Coarse-grained (statistical localization) Linear regression and Kalman filter [31] Maximum a posterior probability estimation [21,27,32] Simultaneous localization and mapping [25,26] Structure from motion [22][23][24] Tracking and camera field of view information [28] Vanishing points and known position [29] measurements add up quickly.Environment maps can be helpful in these situations.Rahimi et al. [27] used the maximum a posterior (MAP) framework for simultaneous calibration and tracking.A network of nonoverlapping cameras is localized by using the motion of a target.The MAP estimates for the calibration parameters are calculated using the trajectory prior (i.e., the motion model) and the likelihood function, which are constructed from the available observations.The MAP approach is highly computationally complex.Furthermore, it is also possible that the solution may place the target inside the field of view of another sensor for which no observations are available at that particular time instance.
Javed et al. [28] use the concept of velocity extrapolation to project the field of view of one camera onto the other.The projection is then used as a tool to find the calibration parameters.However, the approach assumes that people walk in a straight line in the unobserved regions.Finally, Junejo et al. [29] propose an approach in which vanishing points are used to find the relative orientation of the cameras whose positions are already known.
A summary of the state-of-the-art approaches for sensor localization is presented in Table 1.

Problem Formulation
Suppose we have a network of N non-overlapping cameras ψ = {C 1 , C 2 , . . ., C N }, similar to [33].Let a trajectory ℘ i within C i be represented as ℘ i = {(x i ( j), y i ( j)) : 0 < j < M i ; i = 1, . . ., N}, where (x i , y i ) is the estimated position of the target in the image plane and M i is the number of target observations from camera C i .Furthermore, let each observation be generated by a motion model as where (x i ν , y i ν ) is the velocity of the object.In addition, (a x , a y ) may change over period of time.Moreover, v j (N (0, Σ v( j) )) is modeling additive noise with covariance Let each camera C i provides a vertical top-down view of the scene (i.e., its optical axis is perpendicular to the ground plane or the trajectories are preprocessed using a homography transformation [34]).Under this assumption, the number of parameters for the localization of each camera C i is reduced to two, namely, the camera position, P i = (p i x , p i y ), and the rotation angle, φ i , expressed as the relative angle between the camera C i and the horizontal axis (Figure 2).To summarize, the unknown parameters Θ i for camera C i are If C i observes the object at a particular time instant t and after t + τ time intervals the object enters into C i+η with C i / = C i+η , then it can be visualized as C i+η is viewing the object from the φ i,i+η ℘ i,i+η position, where φ i,i+η and ℘ i,i+η are the rotation matrix and the translation vector.
The camera localization process estimates (℘ i,i+η , φ i,i+η ) such that the configuration estimation error ε becomes minimum, that is, where ( x (i,i+η) , y (i,i+η) ) is the projected estimate of the object's position from C i at t to C i+η at t + τ.  (p i x , p i y , φ i ) represent the unknown camera location and rotation to be estimated.

Kalman filter
Kalman filter

Proposed Approach
Let an object move in the environment and be tracked by each camera in the network.To find the rotation matrix, φ i,i+η , and the translation vector, ℘ i,i+η , between adjacent cameras, we propose a two-step iterative process.Two consecutive batches of measurements from two adjacent cameras form one iteration and the adjacency between cameras are defined by the motion of the object itself.The first step calculates ℘ i,i+η by estimating the missing trajectory between pairs of adjacent cameras.In the second step, φ i,i+η is calculated by utilizing the trajectory information and the object's exit and entry points of the fields of view of adjacent cameras.The details of each step are given below.

Trajectory Estimation in Unobserved
Areas. Figure 3 shows the flow diagram of the trajectory estimation process.Unlike [31], each trajectory segment ℘ i s from each camera C i is smoothed for intercamera trajectory estimation: where S is a smoothing function and i = 1, . . ., N. If C i and C i+η are the two adjacent cameras, then ℘ i,i+η is the estimated trajectory between the camera pair: where G and H are nonparametric and a parametric functions, respectively, which extrapolate a smoothed trajectory in the unobserved region.We use the Kalman filter (see [35,36]) as a parametric function (H).The approach is inspired by speech recognition, where the Kalman filter [37] has extensively been used to estimate intervals with missing observations [38].However, the Kalman filter's innovation signal distorts significantly in the absence of target information.To overcome this problem, we employ linear regression estimation as a nonparametric function, as, in the unobserved regions, Kalman filter and the linear regression models exhibit a similar behavior (see the appendix).
Within each camera's field of view, the Kalman filtering is applied to obtain the initialization of the parameters.When the target enters an unobserved region, the linear regression model replaces the Kalman filter to estimate the target position.In order to improve the estimation, we use both forward and backward motion models.The process  , for each segment k: where α(k) = k/J and k = 1, . . ., J are J segments of the estimated trajectory.The forward estimation results are given weights that are decreasing with the distance from the border of the camera's fields.The backward estimation results are given higher weights when the object gets closer to the next camera's field of view.The underlying assumption for this approach is that both trajectories contribute to the construction of the estimated trajectory [39].The linear regression model expands over the uncertainty volume of the region so that when the target is once again visible in a camera, it can immediately reinitialize the Kalman filter.For simplicity, sharp turns are modeled as smooth curves (Figure 4).The process terminates if the difference between five consecutive iterations is smaller than a threshold or when all the available data are utilized.

Orientation Estimation.
The relative angle φ i,i+η between two adjacent cameras C i and C i+η is computed by calculating the angle between the observed object position (x i+η , y i+η ) in camera C i+η and the corresponding estimated object position ( x (i,i+η) , y (i,i+η) ) in the same camera by extrapolating the trajectory from C i : x i+η , y i+η x (i,i+η) , Once φ i,i+η is computed for all pairs of adjacent cameras, the final configuration is obtained by rearranging all the cameras C i (i = 2, . . ., N) with respect to the reference sensor C 1 .An example of the complete localization process of the proposed algorithm in shown is Figure 5.The figure shows the original trajectory segments (linear and nonlinear) as well as reconstructed trajectory segments in unobserved regions.Furthermore, intermediate and final localization results of the proposed algorithm are also provided in the same figure to demonstrate the process of localization over iterations.

Experimental Results and Analysis
In this section, we compare the proposed approach (CLUTE) with the MDS (see [16,17]) and the MAP approaches (see [27,32]) on both simulated and real data.A 4-camera and a 8-camera network, are tested with simulated data, and a 4-camera network is tested with real data.The performance of the algorithms is evaluated based on the translation and rotation errors in the localization using ground-truth information.The translation error is calculated as Euclidean distance between the true camera position and the estimated camera position.The rotation error is calculated as absolute difference between the true camera orientation (the angle with respect to the reference camera) and the estimated camera orientation.The details about the dataset used and the experimental results are given below.

4-Camera Network.
In this setup, we first simulate a network of 4 cameras and then test the algorithms on a real camera network.Let an object travel across the network and generate the trajectory.Each camera observes the object within its field of view (in Figure 6, the observations within each field of view are shown with circles).The lines outside the squares are the unobserved track points of the object.We analyze the results of three experiments using the same network configuration, but with three different moving objects.The CLUTE results are shown in Figure 8.The translational error at each iteration is shown in Figure 9.
The performance of CLUTE is evaluated and compared with MAP and MDS on the original, noisy, and subsampled data.Also, the algorithms are compared in the case of camera failure.
Figure 10 compares the results of CLUTE, MAP, and MDS along with the true positions of the cameras.The visual inspection of the results show that in all three experiments, the performance of the statistical approaches is better than that of nonstatistical approaches (especially for C 2 in the experiment 1 dataset, for C 3 in the experiment 2 dataset, and for C 3 and C 4 in the experiment 3 dataset).The detailed experimental results are shown in Table 2.In the first experiment, the average translation error for CLUTE is 0.18 units or 4.5% of the environment area.Furthermore, the average rotation error for CLUTE is 1.29 • .In comparison with the other two approaches, on average CLUTE is more accurate in locating the network.The average translation error for MAP and MDS is poorer by 0.50% and 3.25%, respectively.Likewise, the average rotation errors for MAP and MDS are worse by 1.35 • and 4.63 • , respectively.In the second experiment, the average translation error for CLUTE is 0.13 units or 3.25% of the environment area, which is better by 1.25% and 4.25% compared to MAP and MDS, respectively.Moreover, the average rotation error estimated by CLUTE is better by 0.41 • and 2.84 • than that of MAP and MDS, respectively.In the third experiment, the average translation error for CLUTE is 0.24 units, which are 6% of the environment area.Also, the average rotation error for CLUTE is 5.37 • .In comparison, the average translation error for MAP and MDS is 0.32 units and 0.46 units, respectively.Similarly, the average rotation angles for MAP and MDS approaches are 8.55 • and 13.72 • .In summary, these experiments show that the performance of CLUTE is better by an average of 1.25% for translation and of 1.65 • for rotation with respect with MAP.In comparison to MDS, the performance of CLUTE is better by 4.5% and 5.28 • for estimating translation and rotation, respectively.In addition to this, the error variance for CLUTE is smaller than that for MAP and MDS.Furthermore, it is noticeable that the estimation results for the statistical approaches (i.e., CLUTE and MAP) are better than that of the nonstatistical approach (i.e., MDS).In general, the main limitation of MDS is that it is based on a single parameter (in this case, the shortest Euclidean distance) with an appropriate value.Therefore, the parameter calculation is accurate and the approach performs satisfactory only if given enough information.
We also analyzed the robustness of CLUTE to reduced sampled trajectory (missing data) and noisy trajectory and in case of camera failure.As MDS is poor in estimating the localization, we compare the robustness of CLUTE and MAP only (Table 3).
For the missing data test, the object's trajectory is downsampled by 2 and by 3 for all the three experiments.The results for downsampling by 2 show that on average CLUTE does not suffer for translation estimation in experiment 1 and experiment 2 and improves in experiment 3. Similarly for the rotation (especially for experiment 2 and experiment 3), a closer look at the results shows that in experiment 3 the estimation result is quite poor for C 3 on the original data compared to the other cameras.Figure 8 shows that   in most cases the object takes very sharp turns before entering into this camera.These sharp turns cause the degradation of the result, while downsampling reduces the sharpness of the turns and therefore simplifies the modeling.Compared to MAP, the average translation error over the three experiments for CLUTE is better by 2.75%.Similarly, the rotation estimation is better by 2.58 • in favor of CLUTE.Also the performance of MAP degrades considerably due to downsampling.The reason for this behavior is due to the fact that in MAP the track points provide the likelihood function over the trajectories and the camera parameters, and therefore the posterior probability depends directly upon the availability of enough reliable object measurements.Downsampling reduces the likelihood probability and hence degrades the overall results.On the other hand, in CLUTE, the trajectories are estimated by interpolation of the track points over time.For this reason, as long as the available object observations maintain the shape of the trajectory, the calibration will be performed accurately.For further downsampling by 3, the average translation error over the three experiments for CLUTE is still 4.25% better than that of MAP.Similarly, the rotation estimation is better by 3.35 • for CLUTE.Also when downsampled from 2 to 3, the degradation for the position estimation is just 0.02 units for CLUTE and 0.08 units for MAP.To summarize, CLUTE is more robust to reduced sampling rates compared to MAP.
In order to analyze the performance of CLUTE with noisy observations, we introduced a 5% Gaussian noise in the measurements with variance equal to the 15% of the camera's field of view, for all three datasets.For both CLUTE and MAP, the results degrade substantially for the position estimation (especially C 3 , C 4 in experiment 1 and C 2 , C 4 in experiment 3) and orientation (especially C 2 in experiment 3).This is due to the fact that the noise not only degrades the quality of the observations (essential for MAP), but also changes the shape of the trajectory (essential for CLUTE), which are necessary for accurate results from both approaches.However, it is noticed that on average the results for CLUTE are degraded by 8% for the translation and 20.02 • for the rotation, whereas the degradation for MAP is 8.5% and 20.20 • .
In order to analyze the localization performance in the case of camera failure, we removed one camera's observations from the available datasets (1-camera failure case).The complete results for the three experiments are shown in Table 4. On average, the results for CLUTE are degraded by 5% for the translation and 22.47 • for the rotation, whereas the degradation for MAP is 10.25% and 24.81 • .The results show that CLUTE is more robust in case of camera failure.
In the final experiment for the 4-camera network, we use real data captured indoor (Figure 7).The fields of view of the cameras are squares whose sides are 1.5 meters, and the cameras are between 3 and 4 meters apart.A toy car is  moving at varying velocities traces a long trajectory across the cameras.Based on the ground-truth segmentation of the moving object in the field of view of every camera, we compute the relative distance and orientation between the cameras based on trajectory estimation.Figure 11 shows the results obtained with CLUTE, and Table 5 compares the results of the three approaches on the real data.For CLUTE, the sensors were on average misplaced by 70 cm from the locations measured by hand, with an average orientation error of 10.33 • .For the orientation estimation, CLUTE performs on average better than MAP by 0.33 • and better than MDS by 5 • .For the position estimation, the error of   than those of MDS.Likewise, CLUTE outperformed MAP in estimating the orientation, as the average rotation error over the three datasets is lower by 0.90% and in estimating the translation MAP is better by 0.16%.We further investigate the robustness of the methods for missing data, noisy trajectories (Table 7), and camera failure.For downsampling by 2, the average translation error over the three experiments for CLUTE is 0.03 unit-size better than MAP.Similarly, the rotation estimation is better by 2.19 • .Also, for downsampling by 3, the average translation error using CLUTE over the three experiments is 1.5% better than using MAP.Furthermore, the rotation estimation is better by 3.69 for CLUTE.Also, when downsampled from 2 to 3, the degradation for the position estimation is 40% better for CLUTE, compared to MAP.For trajectories contaminated with a 5% Gaussian noise with variance equal to the 15% of the camera's field of view, the average translation error for CLUTE over the three datasets is 2.5% better than that of MAP and the rotation error is also better by 2.48 • .For the camera failure, we simulated the one, three, and five cameras failure situations by ignoring the measurements coming from one, three, and five cameras, respectively, and  by evaluating the localization accuracy of the remaining sensors.For one camera failure (Table 8), the average translation error (taken over three datasets) for CLUTE is 0.16 unit, which is better by 0.25 units compared to MAP.The rotation error for CLUTE is 6.81 • better than MAP.Likewise, the translation error for CLUTE in case of three (Table 9) and five (Table 10) cameras failing is 12% of the environment size.The translation results obtained by CLUTE are better than MAP by more than 5% in both cases.Also for the rotation error, the performance of CLUTE is better by 2.3 • and 9.52 • for the three and five cameras missing cases, respectively.

Conclusions
We proposed an algorithm to recover the network configuration of a set of cameras with disjoint views.The algorithm finds the position and orientation of each camera on a common ground plane and consists of two main steps: the estimation of the unobserved trajectory in the regions not covered by the cameras' fields of view and the estimation of the relative orientations of the cameras.Kalman filtering and linear regression are used for the estimation of the trajectory.Forward and backward estimations are used to increase the reliability of the results.The relative orientation of the cameras is obtained by using the exit and entry point information in each camera's fields of view.
We have compared the performance of the proposed approach with a statistical approach (MAP) and with a nonstatistical approach (MDS) on both simulated and real data.The experimental results show that CLUTE is more accurate in localizing the network compared to these stateof-the-art approaches.Also, the proposed approach is more robust for missing and noisy data and performs better in case of failure of one or more cameras.
Our current work includes further improvements of the trajectory estimation in unobserved areas by tracking, when available, the audio information captured with stereo microphones coupled with each camera [40].

Figure 1 :
Figure 1: Example of camera network with nonoverlapping fields of view.

2 yFigure 2 :
Figure 2: Schematic representation of a scene observed with nonoverlapping cameras (C i ).(p ix , p i y , φ i ) represent the unknown camera location and rotation to be estimated.

Figure 3 :
Figure 3: Trajectory estimation process for the unobserved region between camera C i and C i+η .

Figure 4 :
Figure 4: Example of sharp turns modeled with smooth curves in the unobserved regions (gray: original trajectory, green: estimated trajectory, gray square: original camera location (field of view), and yellow square: estimated camera location (field of view)).

Figure 5 :
Figure 5: An example of localization of a camera network: (a) a network of cameras along with the original object trajectory (dash-line); (a-d) few nonlinear turns that object take in unobserved regions, (b) estimated trajectory and final localization results with the proposed algorithm, and (c) outputs at intermediate iterations.

Figure 6 :
Figure 6: Three trajectory datasets from a nonoverlapping 4-camera network.Shaded squares indicate the field of view of each camera, circles denotes the observed object locations, and lines show the portions of the trajectories unobserved by the cameras.

Figure 7 :
Figure 7: Sequence of images extracted from real dataset consisting of four nonoverlapping cameras.Frame numbers show the motion of the object across the network.

Figure 8 :
Figure 8: CLUTE localization results for the 4-camera network.Yellow squares are the estimated localization and green lines indicate the e6stimated trajectory in the unobserved areas.(a) Experiment 1 dataset; (b) experiment 2 dataset; (c) experiment 3 dataset.

Figure 9 :
Figure 9: Evolution of the CLUTE translation error at each iteration in the three experiments.

Figure 11 :
Figure 11: Network localization results using CLUTE on real data.(black: original trajectory, green: estimated object trajectory in the unobserved areas, gray square: original camera location (field of view), andyellow square: estimated camera location (field of view).

Table 1 :
Summary of the state-of-the-art approaches for sensor localization.

Table 2 :
Performance comparison between CLUTE, MAP, and MDS on the three datasets (4-camera network).

Table 3 :
Performance comparison between CLUTE, MAP, and MDS on the three datasets for missing and noisy data (4-camera network).

Table 4 :
4-camera network: CLUTE and MAP comparison for 1camera failure.

Table 5 :
The rotation (in degrees) and translation (in meters) errors of CLUTE for real data.Three datasets have been generated by simulating three different moving objects in a network of 8 cameras with nonoverlapping fields of view.The results obtained with CLUTE on these datasets are shown in Figure12.Table6contains the results obtained in each experiment.CLUTE outperformed MDS as its average translation and rotation errors are 13.75% and 3.63 • lower

Table 6 :
Performance comparison between CLUTE, MAP, and MDS on the three datasets (8-camera network).

Table 7 :
Performance comparison between CLUTE, MAP, and MDS on the three datasets for missing and noisy data (8-camera network).

Table 8 :
8-camera network: comparison between CLUTE and MAP in case of the failure of 1 camera.

Table 9 :
8-camera network: comparison between CLUTE and MAP in case of the failure of 3 cameras.

Table 10 :
8-camera network: comparison between CLUTE and MAP in case of the failure of 5 cameras.