CKF-Based Visual Inertial Odometry for Long-Term Trajectory Operations

)e estimation error accumulation in the conventional visual inertial odometry (VIO) generally forbids accurate long-term operations. Some advanced techniques such as global pose graph optimization and loop closure demand relatively high computation and processing time to execute the optimization procedure for the entire trajectory and may not be feasible to be implemented in a low-cost robotic platform. In an attempt to allow the VIO to operate for a longer duration without either using or generating amap, this paper develops iterated cubature Kalman filter for VIO application that performsmultiple corrections on a single measurement to optimize the current filter state and covariance during the measurement update. )e optimization process is terminated using the maximum likelihood estimate based criteria. For comparison, this paper also develops a second solution to integrate VIO estimation with ranging measurements. )e wireless communications between the vehicle and multiple beacons produce the ranging measurements and help to bound the accumulative errors. Experiments utilize publicly available dataset for validation, and a rigorous comparison between the two solutions is presented to determine the application scenario of each solution.


Introduction
Visual inertial odometry (VIO) employs the sensor fusion between inertial measurement unit (IMU) measurements and camera's image information to enhance the accurate estimation of vehicle trajectory [1,2]. e VIO system architecture is summarized in Figure 1 where the front-end and back-end computations are designed to exploit the benefits of both sensors, produce reliable ego-motion estimation, and achieve robust performance. e vision frontend computation attempts to track 3D feature points through different camera images. Geometric constraints between multiple camera poses are established to fuse with the IMU data [3][4][5]. e effectiveness of the process and the estimation accuracy depend on the sensor fusion strategy. Recent literature has introduced multiple sensor fusion solutions with varying requirements of hardware computing resources [2]. However, VIO systems suffer from the inevitable accumulation of error. is limitation makes the system gradually diverge and even fail to track the vehicle trajectory over long-term operation. VIO only produces reliable estimation of the vehicle trajectory in short-term operation and short-distance travel.
is issue obviously demands the development of VIO techniques to allow the system to operate for longer duration.
In SLAM and odometry applications, loop closure [6][7][8] and global pose graph optimization [7] have proven to be effective in addressing the accumulative error issue. ese techniques require the entire trajectory and map to relocalize the vehicle estimates and improve the estimation accuracy. Such solutions demand extremely high computational load, memory utilization, and processing time for execution [9,10]. ese costs are challenging in real-time computation and may not be affordable for some microrobotic systems [1,10,11], having limited hardware computing capability. For example, the work presented in [10] reported the failure to execute these advanced techniques of VINS-Mono [7] on ODROID, an embedded PC with a hybrid processing unit [12]. Also, these solutions are effective only when the vehicle makes a closed-loop trajectory to reobserve some previous landmark features. Not all vehicle trajectories will include loops to activate the loop closure. Alternatively, in [9,13,14], a local optimization within the sliding window has been performed. eir implementations did not require the installation of any specialized hardware computing platform and some advanced optimization software library. e effectiveness of this technique depends on the tuning parameters and the criteria for terminating the iteration. e work described in [4] had developed a tightly coupled cubature Kalman filter-(CKF-) based algorithm for VIO application.
is was a filtering approach that also encountered the issue of error accumulation over long-term operation. However, the CKF-based algorithm utilized trifocal tensor geometry (TTG) for the computationally efficient visual measurement update.
e TTG-based model had shown enormous potential to implement the iteration for the local optimization. Hence, in this paper, we attempt to implement this idea and improve the filter estimation accuracy over long-term operation.
is paper describes the development of a novel iterated CKF-based VIO that performs multiple corrections on a single measurement. e process locally optimizes the estimate of the current filter state and covariance during the visual measurement update. e system employs maximum likelihood estimate (MLE) based criteria to terminate the optimization process. For comparison, this paper also investigates the combination of VIO and ranging measurements as a second solution. e wireless communications between the vehicle and multiple beacons produce ranging measurements and help to bound the accumulative errors. e integration follows the sequential-sensor-update approach, which enables the operational independence of each measurement update. To summarize, this paper makes the following contributions. Firstly, we utilize the benefit of TTG model to enhance the optimization during the filter update step. e implementation does not increase the system complexity significantly or require the installation of any advanced optimization library. is strategy is suitable for self-localization projects without using any additional sensors. Secondly, we investigate the combination of VIO and ranging measurements to bound the estimation error over long-term operation. is solution can be applied for largescale navigation projects with multiple known-location beacons.
irdly, the work described in this paper contributes to the group of VIO filtering approaches. Two solutions are proposed for long-term operation without using any map and the entire trajectory. We also conduct some comparisons to determine the benefits of each solution and application scenario. e remainder of the paper is organized as follows. Some related works are presented in the next section. Section 3 introduces system coordinates, the original VIO design based on CKF, and the issue of error accumulation. Section 4 describes the first solution for employing iterated CKF, while Section 5 describes the second solution for using ranging sensors. Experimental validation for each solution is also reported accordingly. Finally, Section 6 presents some discussion and conclusion.

Related Works
Generally, engineered devices are subject to unavoidable imperfections due to technical limitations of manufacturing processes and material properties. e defects result in many uncertainties (i.e., noise and disturbance) affecting the device's operation [15,16]. Device developers need to invest in more time and effort to understand the sources and impacts of these uncertainties before proposing any algorithm and designing optimal software architecture. e system, including camera and IMU, also encounters a class of similar problems that raise more challenges in calibration and designing computational procedures [2,6,7]. It is worthwhile to study uncertainties in the VIO system to improve its performance in long-term operations.
In the VIO literature, many researchers have addressed the accumulative error issue. Nonlinear optimization approach can minimize the estimation error over long-term operation. For example, Qin et al. employed a global 4-DOF pose graph optimization for the entire trajectory [7]. is strategy consumes a considerable amount of memory and CPU usage, especially when the size of the pose graph grows unbounded as the travel distance increases. A down-sample process was useful in this case to include only all primary key-frames within loop closure constraints. is process can  Figure 1: Overview of visual inertial odometry architecture.
limit the pose graph database to a certain size but negatively affect the quality of the estimation. e optimization can be enhanced within a sliding window to reduce the computational cost as in [6]. Loop closure approach is another common solution to address estimation drift [7,17,18]. If the loop is detected, the vehicle estimate is relocalized based on multiple constraints of camera observations. In general, compared to other existing VIO strategies [10], the VIO systems with these two approaches require considerable computational resources, processing time, and the installation of advanced optimization software libraries such as Google's Ceres Solver [19] and DBoW2 loop detector [20]. ese approaches are not preferable for microrobotic systems with limited memory and CPU speed. For example, the work presented in [10] needs to reduce the maximum number of features from 400 to 200, the key-frame of sliding window from 5 to 3, and the number of IMU linked frames from 3 to 2 in order to implement OKVIS [6] on ODROID. Similarly, the implementation of VINS-Mono without loop closure [7] on ODROID requires the reduction of the number of tracked features from 150 to 100. VINS-Mono with loop closure cannot be executed on ODROID [7].
Constructing the optimization process within a sliding window has reduced the computational complexity to apply for resource-constrained devices. Heo et al. upgraded the Multistate Constraint Kalman Filter (MSCKF) structure using the local optimization for all measurements before refining the global states [3,9,21]. e deployment attempts to utilize full information within the sliding window to improve the filter performance. e optimal relative pose constraints are inferred to include the relative motion constraints and any prior information during the filter update. Even so, the filter performance in noisy environments is questionable because if only shows a slightly smaller estimation error than the MSCKF in a practical experiment. Alternatively, some researches [13,14] implemented an iterative procedure to optimize the filter mean and covariance during the filter update. is kind of deployment attempts to minimize the localization errors at the cost of increased computational requirements. However, this increase is feasible for resource-constrained devices [13]. e complexity of the visual measurement model also affects the computational efficiency of the optimization execution. In our previous project [4], we have explored the potential of TTG, which can predict the visual measurements using three camera frames. Moreover, the TTG-based visual measurement model operates as a straightforward function and consumes less computational cost than the traditional model. It will be more efficient if we perform local optimization with the TTG-based model. In this paper, we will implement this idea, determine the benefits of using TTG in local optimization, and compare it with other solutions in an attempt to minimize the estimation error accumulation.
Besides inertial and visual measurements, some researches have included other measurements to improve the system performance in dynamic environments [22] and in some cases where camera images are not useful for navigation [23]. Peretroukhin et al. applied Convolutional Neural Network (CNN) to extract the sun direction directly from the existing image stream [22]. e extraction was used to correct the global orientation. Although the estimation of a sun direction vector improved vehicle trajectory tracking, that solution was not always available, particularly during cloudy weather and nighttime. It only affects the orientation and also requires considerable resources to train and execute the deep learning model of sun detection. Many studies attempt to address the computational issue of deep learning for real-time applications using parallel architecture [24][25][26]. ese solutions accelerate the deep learning execution for constrained hardware with competitive energy efficiency. Also, a Light Detection and Ranging (LIDAR) sensor can help to reduce visual drift in undesired lighting conditions [27,28]. is solution can improve the positioning accuracy over long-term operation satisfactorily. However, the deployment of LIDAR can raise the issues of power consumption and payload for microrobotic systems. e wireless communication between the vehicle (tag) and known-location beacon (anchor) has been popularly applied to supplement the primary navigation system [27,[29][30][31]. Such systems commonly detect the Time of Arrival (TOA) of signals encoded in the radio or acoustic waves to conduct a ranging measurement. For example, some researches [30,31] have deployed ultra-wideband (UWB) radio modules for ranging measurements.
is technique suffers from systematic errors such as uncertain wave speed or clock synchronization errors. Consequently, it cannot directly measure the true geometric range, which is why it is called pseudorange measurement [27]. Wang et al. integrated ranging measurements through solving optimization problem [31]. e process attempts to align UWB localization with VIO to produce an optimal estimate for the vehicle position. Similarly, the work presented in [32] formulated the optimization in the scheme of moving horizon estimation using CasADi software optimization library [33]. Despite the heavy computational cost, the experiment result suggested the use of ranging measurements to bound the accumulative errors over a long-term operation. is also demands a better strategy to integrate VIO with ranging measurement for resource-constraint systems. Alternatively, in this paper, we will apply a sequential-sensor-update approach in the scheme of Kalman filter for the multisensor integration. It is interesting to compare the effectiveness of two proposed approaches (i.e., local optimization against additional-ranging integration) in improving the VIO estimate over a long-term operation.

Preliminaries
is section attempts to summarize a CKF-based design for VIO sensor fusion algorithm and the benefits of TTG-based measurement model, which was developed in the previous work [4]. is design reveals the issue of the accumulative errors in long-term operation. Two solutions proposed in this paper will be presented in the next section.

System Coordinates and Notation.
Matrices and vectors are denoted in boldface. We use a superscript to indicate that the vector is expressed with respect to a specific reference frame. For example, A v is the vector v expressed in frame Journal of Robotics 3 {A}. e VIO coordinate system is assigned as in Figure 2 with a global frame {G}, a camera frame {C}, and an IMU frame {I}. e VIO system tracks the transformation of {I} with respect to {G} with the rotation matrix G R I and the translation matrix G p I . Similarly, the transformation of {C} with respect to {I} is represented by the rotation matrix I R C and the translation matrix I p C . e camera observes the landmark feature point C f i in order to perform the filter correction step. Note that rotation matrices have the essential properties of special orthogonal group SO (3) denotes the tangent space to the group SO(3)'s manifold (at the identity) and coincides with the space 3 × 3 of skew symmetric matrices. I denotes the identity matrix while 0 denotes the zero matrix. We use subscripts to indicate the sizes of these matrices such as I 3×3 . We also mention Special Euclidean Group SE(3) for describing the group of 3D rigid motion, at group's operations are listed: . We use quaternion approach to operate the rotation. R(q) is a function producing the rotational matrix from the quaternion q.

Filter State Formation and Prediction.
e VIO system utilizes IMU data for filter state propagation. At the time t, IMU sensor provides accelerometer and gyroscope measurements ( I a m and I ω m ), which are expressed in three directions (x, y, and z) with metric unit in {I} frame and modeled as presented in (1) [3,34].
where G g is the gravitational acceleration in the frame {G}; I ω is the platform angular velocity in the frame {I}; G a is the platform linear acceleration in the frame {G}; and b a and b b are bias of accelerometer and gyroscope. e residual noise terms n a and n g are modeled as zero-mean white Gaussian noise processes. G v I denotes the linear velocity of the platform. e filter state includes the IMU state and the last two poses T . e true filter state is described as a combination of the nominal state x k and the error state x k . In the filter prediction step, 4 th order Runge-Kutta numerical integration of kinetic equations (2) is utilized to compute the predicted state [3].
on the other hand, the error state is constructed as We prefer to present the filter rotation error through the error quaternion [3,35]. Obviously, this presentation will handle the quaternion in its minimal representation and improve numerical stability [34]. In the case of quaternion, we can execute the transformation G δq T filter state can be computed using G q I � G q I ⊗ G δq I ; (3) e kinetic equations (4) describe the error state operation.
ese equations are written in the form of _ x k � F c x k + G c n IMU with the noise vector n IMU � [n g , n a , n wa , n wg ] T associated with variance σ 2 g , σ 2 a , σ 2 wa , σ 2 wg , respectively. For discrete time implementation, F c is discretized to have F d as in [3,36]. en, the discrete time covariance Q d is derived with the continuous time system noise covariance Q c � diag(σ 2 g , σ 2 a , σ 2 wa , σ 2 wg ). e propagation of the state covariance matrix can be performed using the discretized error state propagation F d and the error process noise covariance matrices Q d as follows:  Journal of Robotics

Predicting Visual Measurement Using Trifocal Tensor
Geometry. e VIO system utilizes visual measurements for filter update. is step requires performing the 3D featurepoint reconstruction before predicting the measurements. is traditional model implements an inverse-depth least-squares Gauss-Newton optimization approach for the reconstruction, which requires considerable time for execution. Alternatively, we apply the point transfer approach using TTG [37], which has low computational complexity and can be used as a nonrecursive function to predict the measurement. is simplification is beneficial to enhance the optimization computation of solution 1. In this section, we summarize the main points of the TTG-based measurement model. Further descriptions about TTG can be found in [4,37].
Assuming that a feature point is tracked through three consecutive images (i.e., I 1 , I 2 , and I 3 ), TTG is applied to predict the visual measurement in the latest image I 3 . e computation utilizes three consecutive camera poses and tracked feature point. e image I 1 has a homogeneous coordinate of the feature point m 1 � [u 1 , v 1 , 1] T , while u 1 and v 1 are pixel coordinates. Similar denotation is applied with m 2 and m 3 . As described in Figure 3, the prediction is achieved through point transferring from the image I 1 to the image I 3 with 2 transfers. In transfer I, we aim to construct the line l 2 in the image I 2 , which is perpendicular to the epipolar line where K is an intrinsic camera matrix and m 1,i is i th element of m 1 . is computation is applied similarly for other tracked feature points. en, the residual can be calculated using the predicted and actual visual measurements. We can construct the visual measurement model h(x k , m 1 , m 2 , m 3 ) as (6). e model also includes the epipolar geometry relations to satisfy the observability constraints of TTG.
where the tensor T i (i � 1, 2, 3) is computed from three camera matrices . a i and b i are the i th columns of the respective camera matrices (i � 1, . . . , 4).
) can be calculated from three camera poses of the image i and the camera-IMU calibration with I R C and I p C . We apply similar procedure for other transformation matrices.

Cubature Kalman Filtering for Measurement Update.
CKF is applied to handle the high nonlinearity of the TTGbased measurement model. CKF is a Jacobian-free nonlinear filter, which applies a deterministic sampling approach, Spherical Radial Transformation (SRT), and cubature rule to generate a minimal set of sampling points. Propagating these sampling points through the nonlinear functions results in a better approximation of the mean and covariance (Figure 4). CKF shares common properties with UKF but is improved in numerical implementation and system stability. In [38], the authors have addressed the vulnerability of the UKF to system failure associated with the negatively weighted sigma points. e presence of these negative points may cause the system to halt or even fail when taking the square-root operation of the covariance matrix. is vulnerable step is removed in the CKF implementation [4,38].
CKF is a straightforward implementation of SRT in the recursive estimation. If a system has n state variables, the third order CKF selects 2n cubature points in order to compute the standard Gaussian weighted integral: where N(., .) is the conventional symbol for a Gaussian density and S is the square-root factor of the covariance P satisfying the equality P k|k−1 � S k|k−1 S T k|k−1 . Cubature points (s � 1, 2, . . . , 2n) are generated with its particular parameter: where e s ∈ R n×1 is the s th elementary column vector. ese cubature points are evaluated through the nonlinear measurement model (6) so as to obtain the predicted measurements (9). en the filter state and covariance are updated using the actual measurement z k and the predicted measurement z k|k−1 as described in (10) and (11).

CKF-Based VIO's Error Accumulation Issue.
After summarizing the main points of the CKF-based VIO design, this paper will analyze the error accumulation issue. For our purpose, we utilize KITTI dataset [39], collected in the city and residential areas. e grayscale image is obtained from the Point Grey Flea2 camera with 1392 × 512 pixel resolution, 90°× 35°opening angle, and 10 Hz update rate. e IMU measurement is obtained from OXTS RT3003 6-axis L1L2 RTK with 0.02 m/0.1°resolution and 100 Hz update rate. We use synced and rectified data, where the distortion effect is removed from the camera image. e estimation is inevitably subject to the error accumulation (drift) like any other odometry estimate. Figures 5 and 6 present the VIO estimation when the vehicle travels in long distance. We can observe that the Root-Mean-Square Error (RMSE) is accumulated gradually. e estimation drifts from its real trajectory, which can be observed clearly in Figures 5 and 6. In Figure 5 with dataset 2011_09_30_0034, the VIO estimation drifts about 50 m after traveling 900 m. Similarly, the VIO estimation drifts about 45m after traveling 1600m with dataset 2011_09_30_0033 in Figure 6. e CKF-based system has employed IMU data for the filter state prediction, and camera image for the filter correction. We can say that the drift mainly derives from the poor performance of the camera observation. e visual measurement update step is unable to correct the residuals completely and suppress the drift effectively [4]. e error accumulation can come from various sources: (i) e camera resolution and calibration are limited to provide reliable measurements in some particular cases such as traveling too fast. (ii) e monocular camera with front-looking configuration suffers limited depth perception. (iii) e sensor-fusing algorithm is limited to capturing the uncertainties and eliminating the environment noise.
e first two issues (i)-(ii) are associated with the hardware configuration, while the last issue (iii) derives from the sensor-fusing technique. Additionally, we can observe that the accumulative errors grow unbounded in an assumed unlimited time. e rotation error, which is limited to the range [−π, π], also contributes to the position error accumulation. In short-term consideration (about 1 second), the drift can be modeled as a linear motion to simplify the problem formation [28]. In long-term consideration, the drift will not grow linearly in the traveled distance. It can be treated as a random process and will increase or decrease depending on the errors in motion vectors [40]. Assuming that the modification of the hardware setup is not an optimal solution, this paper will address the issue through the sensor-fusing algorithm with two proposed solutions in the next sections.

Solution 1: Iterated Cubature Kalman Filter
is section presents the first proposed solution to address the error accumulation issue. We upgrade the filter design presented in Section 3 to perform multiple iterations of the visual measurement update step. is advanced computation helps to optimize the estimation and reduce the accumulative errors.

Iterated Measurement Update.
In general, the estimated filter state x k|k is closer to the true filter state than the predicted filter state x k|k−1 . During the iteration, the estimated state x (j) k|k at j th iterate produces a better approximation to the filter true state than the estimated state x (j−1) k|k at (j − 1) th iterate. e iterated filter state correction is performed as (12): e iteration computes the positive-definite covariance matrices P  Journal of Robotics k|k . However, when each element of the matrix P (j) k|k is bounded, we also have lim j⟶∞ P (j) k|k � lim j⟶∞ P (j−1) k|k [41]. Additionally, it is inferred from (13) that lim j⟶∞ K (j) k � 0 violates the initial assumption (lim j⟶∞ K (j) k ≠ 0). Actually, the assumption cannot be hold during the iteration. erefore, lim j⟶∞ K (j) Consequently, the iteration achieves the global convergence.
During the iteration, the state variation Δx k|k is evaluated as (14). Obviously, a predetermined threshold ρ can be set for Δx (j) k < ρ to terminate the iteration. However, tuning the threshold ρ is challenging in dynamic noisy environments while the optimal value is not guaranteed to be in the likelihood surface. is limitation gives rise to the development of an alternative approach to terminate the iteration.
k , a cost function is evaluated at j th iterate: It is complicated and impractical to determine the minimum value of the cost function J, which reveals the MLE of x k and z k [42]. Instead, we can observe that the inequality condition (16) is always satisfied during the optimization process. In other words, J(x (j) k ) is closer to the maximum likelihood surface than J(x k is a more accurate approximation to MLE than x e inequality is utilized to terminate the iteration. To apply for the cubature measurement update, the inequality condition is rewritten with P (j) as follows [41]:

Experimental Validation of Iterated CKF.
We also use KITTI dataset for experimental validation because it allows verifying the system performance in long-term operation with long-distance traveling. It also helps to observe the impact of the accumulative errors as well as the effect of implementing each solution. e estimates of these filters are presented in Google ™ Maps, where we also provide multiple zoom-in images for comparison. e evaluation criteria are RMSE and the rotation error. e ground-truth data is obtained from GPS/IMU inertial navigation system data. In this case, the iterated CKF has superior performance than the other filters. e CKF and UKF produce accurate estimates within the first distance of 200 m. e employment of the optimization process has effectively decreased the position estimation errors and minimized the accumulative errors. In Figure 8, we only observe the drift in the performance of iterated CKF after 800 m. e rotation estimates of these filters have similar accuracy in Figure 9. e iterated CKF continuously updates the bias of accelerometer and gyroscope at each time instant as in Figure 10. e IMU estimation with only the integration of IMU data is not able to produce reliable estimations and track the vehicle trajectory. e integration with the camera in VIO system has improved the quality of the estimation. Figures 11 and 12 present the experiment with KITTI dataset 2011_09_30_0033. e vehicle travels in a loop about ∼1800 m. Similarly, the employment of iteration helps to improve the estimation effectively despite the computational (1) Initialize the filter state and covariance matrix; (2) for k ∈ (1, . . . , ∞) do (3) Use IMU data to predict the nominal filter state x k with 4 th order Runge-Kutta numerical integration; (4) Calculate F d and Q d (5) Compute the propagated state covariance P k|k−1 ; (6) if New Image I k then (7) I k ⟶ I 3 . Perform feature detection for I 3 and match these features with features {m 1 } and {m 2 } to have feature tracking while ((Condition (17) is satisfied) and (j < N)) do (10) Calculate the variation rate Δx k|k using (14); (11) Generate Cubature points using (8); (12) Compute innovation covariance matrix and the predicted measurement: (9) and (10); (13) Compute the new filter state and covariance matrix: (12) and (13); (14) end (15) State transition and rearrange the covariance matrix with respective camara poses;  Journal of Robotics cost. Considering the rotation errors in Figure 12, all three filters have almost similar accuracy. e video demonstration of this solution can be located at youtu.be/-8SWh-cy-Ck.

Solution 2: Pseudorange Measurements
is section presents the second proposed solution to address the error accumulation. In solution 2, additional pseudorange measurements are tightly integrated with VIO, which will bound the estimation error and correct the positional drifts. e pseudorange measurement can be established by the wireless transmission between anchor and tag units. e tag unit is mounted on the vehicle and communicates with multiple anchor units, which are installed rigidly at known locations in the environment. e vehicle can be passive [43] or active [30] in the communication process, depending on how many vehicles are employed. Each pseudorange measurement can be modeled as in 48   Journal of Robotics where G p a,d is the position of the d th anchor, G p I is the current position of the vehicle, β is a bias of range error model, ζ d is the coefficient describing the influence of β on the pseudorange measurement, and ‖·‖ 2 is the Euclidean distance. Considering the real-time implementation, multiple hardware modules can be employed such as Decawave [44] or Time Domain's P410 UWB module [45].

Sequential-Sensor-Update Approach.
We set up multiple anchors along the vehicle trajectory as in Figure 13. In reality, multiple pseudorange measurements can enable the vehicle to self-localize by using either Time of Arrival (TOA) or Time Difference of Arrival (TDOA) measurements. In general, that self-localization system requires at least four available ranging measurements for accurate estimation [31,43]. However, if the vehicle is traveling long distance, it is difficult to always receive enough ranging measurements to perform self-localization. We consider here the case where the ranging measurements are only used to supplement the VIO. We can synthesize all measurements (i.e., visual and ranging measurements) into a single composite group sensor with only one measurement model and similarly apply CKF for the estimation. However, this group sensor approach assumes that all sensors have similar update rates and that measurements are always available. is assumption does not satisfy our system configuration when the    camera and ranging sensors operate independently. e number of available ranging sensors may vary depending on the communication process. As a result, the sequentialsensor method will be applied in this study. e sequentialsensor-update approach considers each sensor's observation as a separate and independent realization. Each sensor will operate following a specific observation model, which can be incorporated into the filter operation in a sequential manner [46]. A set of all observations made by the camera up to time k will be denoted by Z k c ≜ z c (1), z c (2), . . . , z c (k)}; a set of all observations made by the pseudorange sensor up to time k will be denoted by Z k r ≜ z r (1), z r (2), . . . , z r (k) ; hence, the set of all observations made by two sensors (camera and pseudorange) up to time k is constructed by Z k c,r ≜ Z k c ∪Z k r . Figure 14 presents the system architecture when integrated with single pseudorange measurement. It can be extended for multiple pseudorange measurements. Notably, the VIO system is the principle module while pseudorange sensory system is secondary. ese two systems are independent, where the pseudorange measurement update cannot intervene in the VIO operation. After performing visual measurement update, if the ranging measurement is not available for some reason, the system will proceed to the state transition. is property helps to sustain trajectory tracking even in the case where there is no communication with any anchor. e matrix H r,k � (zh/zx)| x k|k−1 describes how the filter states are mapped to the pseudorange measurement outputs, while it is computed by applying first-order Taylor series approximations to the pseudorange model. Meanwhile, the filter state includes three poses at times k, k − 1, and k − 2, which are guaranteed to satisfy the constraint of trifocal tensor geometry and epipolar geometry. Consequently, the pseudorange measurement model will be reconstructed so that it also corrects three poses at each time step: In the following equations, "r" denotes the ranging measurement while "c" denotes the visual measurement using camera. e filter state and covariance updated by the visual measurement are presented as x c,k|k and P c,k|k . When a new ranging measurement arrives, the system will use it to perform the filter update step following the sequential-sensor-update approach [46,47].
is approach also helps to handle the asynchronous data. e pseudorange innovation is computed as (20). e Kalman gain K r,k is calculated with variance R r � diag[v r , v r , v r ] as in (21). en the filter state x cr,k|k and covariance P cr,k|k corrected by both measurements can be computed as in (22).
x cr,k|k � x c,k|k + K r,k z r,k ; P cr,k|k � P c,k|k − K r,k H r,k P c,k|k .
e pseudorange sensor is employed to supplement VIO. It is important to identify and reject pseudorange measurement outliers before fusing with VIO. Mahalanobis distance d k of innovation covariance S r,k and innovation z r,k is measured to form the validation measurement gate, which defines the region in the pseudorange measurement space where valid measurements can be found. Any measurement outside that region is considered as an outlier and will not be integrated with VIO. e gate threshold c G is used to reject the pseudorange outliers. e innovation covariance S r,k is computed before evaluating the validation gate as in  Figure 14: System architecture for integrating pseudorange measurement. solution 2. We simulate the ranging system using GPS data. In Figure 15, multiple anchors are presented as cyan square marker. We denote the estimation of solution 2 as VIO-+ Ranging. Figures 15 and 16 shows the improvement of estimation accuracy. e use of additional pseudorange sensors helps to bound the estimation error over long-term operation. Additionally, in these zoom-in images of Figure 15, we can observe some specific locations where the vehicle is estimated off the road.
e system does not consider the issue of obstacle avoidance, which can be solved using laser-range sensor such as LIDAR.

Discussion and Conclusion
is section will discuss the pros and cons of the two proposed solutions, which helps to determine the scenario application of each in terms of system accuracy and hardware implementation. Solution 1 implements iterated CKF, which optimizes the latest filter state and covariance during the measurement update. Solution 2 employs the pseudorange measurements to bound the estimation errors.
Estimation accuracy: e experimental results reveal how each solution can improve estimation accuracy. Figures 15  and 16 evaluates average RMSE of position estimation for each solution. e effectiveness of solution 1 depends on the termination criteria, which decide the number of iterations. Meanwhile, the placement of the pseudorange sensors will influence the number of required ranging corrections to bound the estimation errors, which in turn affects the outcome of solution 2. Two proposed solutions can be classified in the group of VIO filtering approaches. Future work will compare the proposed solutions with the smoothing approaches (e.g., [7,17]).   Figure 15: Position estimation result of dataset 2011_09_30_0034 presented on Google Maps. the cost is reduced by 20%. e iterated CKF consumes more processing time than the VIO+Ranging due to the dimension of the measurement model. Solution 1 executes multiple iterations of the visual measurement update to optimize the estimation, while solution 2 only needs about 1-3 pseudorange measurements. e dimension of the feature-based measurement model is much larger than that of the pseudorange measurement model. Solution 1 increases the computational cost significantly, which may limit applicability to microand nanorobotic applications. To improve the computational efficiency, we can select a small subset of tracked landmark features, which in turn decreases the dimension of the visual measurement model. Alternatively, the use of specialized computing platforms (e.g., FPGA and GPU) can achieve parallel processing and speed up the VIO execution [48]. Discussions about selecting optimal hardware for implementation are beyond the scope of this paper.
Although both solutions allow the VIO estimation to operate for longer duration, the hardware/software requirement of each solution is another consideration for implementation. Solution 1 with MLE-based optimization does not require the installation of advanced optimization library such as Google's Ceres Solver [19] and CasADi [33]. It is feasible to use the same system configuration for implementation even with hardware constrained platforms. On the other hand, solution 2 requires the setup of multiple anchors along the vehicle trajectory. It cannot be applied to unknown environments.
Overall, both solutions allow the VIO to operate for a longer duration without using a map andexecuting the optimization of the entire trajectory. A VIO developer will consider these mentioned advantages and disadvantages of each solution, the possible hardware configuration, and budget limitation before deciding which solution to select for upgrading his available VIO system. In general, solution 2 is more suitable for large-scale navigation projects, while solution 1 is preferable for stand-alone self-localization projects.

Data Availability
e data used to support the findings of this study are available from the public scene datasets.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this paper.