Handling Data Uncertainty and Inconsistency Using Multisensor Data Fusion

Data provided by sensors is always subjected to some level of uncertainty and inconsistency. Multisensor data fusion algorithms reduce the uncertainty by combining data from several sources. However, if these several sources provide inconsistent data, catastrophic fusion may occur where the performance of multisensor data fusion is significantly lower than the performance of each of the individual sensor.This paper presents an approach to multisensor data fusion in order to decrease data uncertainty with ability to identify and handle inconsistency.The proposed approach relies on combining amodified Bayesian fusion algorithmwith Kalman filtering. Three different approaches, namely, prefiltering, postfiltering and pre-postfiltering are described based on how filtering is applied to the sensor data, to the fused data or both. A case study to find the position of a mobile robot by estimating its x and y coordinates using four sensors is presented. The simulations show that combining fusion with filtering helps in handling the problem of uncertainty and inconsistency of the data.


Introduction
Multisensor data fusion is a multidisciplinary research area borrowing ideas from many diverse fields such as signal processing, information theory, statistical estimation and inference, and artificial intelligence.This is indeed reflected in the variety of the techniques reported in the literature [1].
Several definitions for data fusion exist in the literature.Klein [2] defines it by stating that data can be provided either by a single source or by multiple sources.Data fusion is defined by Joint Directors of Laboratories (JDL) [3] as a "multilevel, multifaceted process handling the automatic detection, association, correlation, estimation, and combination of data and information from several sources." Both definitions are general and can be applied in different fields including remote sensing.In [4], the authors present a review and discussion of many data fusion definitions.Based on the identified strengths and weaknesses of previous work, a principled definition of data fusion is proposed as the study of efficient methods for automatically or semiautomatically transforming data from different sources and different points in time into a representation that provides effective support for human or automated decision making.
Data fusion is applied in many areas of autonomous systems.Autonomous systems must be able to perceive the physical world and physically interact with it through computer-controlled mechanical devices.A critical problem of autonomous systems is the imperfection aspects of the data that the system is processing for situation awareness.These imperfection aspects [1] include uncertainty, imprecision, incompleteness, inconsistency, and ambiguity of the data that may results in wrong beliefs about system state and/or environment state.These wrong beliefs can lead consequently to wrong decisions.To handle this problem, multisensor data fusion techniques are used for the dynamic integration of the multithread flow of data provided by a homogenous or heterogeneous network of sensors into a coherent picture of the situation.
This paper discusses how multisensor data fusion can help in handling the problem of uncertainty and inconsistency as common imperfection aspects of the data in autonomous systems.The paper proposes an approach to Advances in Artificial Intelligence multisensor data fusion that relies on combining a modified Bayesian fusion algorithm with Kalman filtering [5].Three different approaches, namely, prefiltering, postfiltering and pre-postfiltering are described based on how filtering is applied to the sensor data, to the fused data, or both.These approaches have been applied in a simulation to handle the problem of data uncertainty and inconsistency in a mobile robot as an example of an autonomous system.
The remainder of this paper is organized as follows: Section 2 reviews the most commonly used multisensor data fusion techniques followed by describing Bayesian and geometric approaches in Section 3. The proposed approaches is presented in Section 4. A case study of position estimation of a mobile robot is discussed in Section 5 to show the efficacy of the proposed approaches.Finally conclusion and future work are summarized in Section 6.

Multisensor Data Fusion Approaches
Different multisensor data fusion techniques have been proposed with different characteristics, capabilities, and limitations.A data-centric taxonomy is discussed in [1] to show how these techniques differ in their ability to handle different imperfection aspects of the data.This section summarizes the most commonly used approaches to multisensor data fusion.

Probabilistic Fusion.
Probabilistic methods rely on the probability distribution/density functions to express data uncertainty.At the core of these methods lies the Bayes estimator, which enables fusion of pieces of data, hence, the name "Bayesian fusion" [6].More details are provided in the next section.

Evidential Belief
Reasoning.Dempster-Shafer (D-S) theory introduces the notion of assigning beliefs and plausibilities to possible measurement hypotheses along with the required combination rule to fuse them.It can be considered as a generalization to the Bayesian theory that deals with probability mass functions.Unlike the Bayesian Inference, the D-S theory allows each source to contribute information in different levels of detail [6].

Fusion and Fuzzy
Reasoning.Fuzzy set theory is another theoretical reasoning scheme for dealing with imperfect data.Due to being a powerful theory to represent vague data, fuzzy set theory is particularly useful to represent and fuse vague data produced by human experts in a linguistic fashion [6].

Possibility Theory.
Possibility theory is based on fuzzy set theory but was mainly designed to represent incomplete rather than vague data.Possibility theory's treatment of imperfect data is similar in spirit to probability and D-S evidence theory with a different quantification approach [7].

Rough Set-Based Fusion.
Rough set is a theory of imperfect data developed by Pawlak [8] to represent imprecise data, ignoring uncertainty at different granularity levels.This theory enables dealing with data granularity.

Random Set Theoretic Fusion.
The most notable work on promoting random finite set theory (RFS) as a unified fusion framework has been done by Mahler in [9].Compared to other alternative approaches of dealing with data imperfection, RFS theory appears to provide the highest level of flexibility in dealing with complex data while still operating within the popular and well-studied framework of Bayesian inference.RFS is a very attractive solution to fusion of complex soft/hard data that is supplied in disparate forms and may have several imperfection aspects [10].

Hybrid Fusion Approaches.
The main idea behind development of hybrid fusion algorithms is that different fusion methods such as fuzzy reasoning, D-S evidence theory, and probabilistic fusion should not be competing, as they approach data fusion from different (possibly complementary) perspectives [1].

Handling Uncertainty and Inconsistency
Combining data from several sources using multisensor data fusion algorithms exploits the data redundancy to reduce the uncertainty.However, if these several sources provide inconsistent data, catastrophic fusion may occur where the performance of multisensor data fusion is significantly lower than the performance of each of the individual sensors.This section discusses four different approaches with different level of complexity and ability to handle uncertainty and inconsistency.

Simplified Bayesian Approach (SB).
Bayesian inference is a statistical data fusion algorithm based on Bayes' theorem of conditional or a posteriori probability to estimate an ndimensional state vector , after the observation or measurement  has been made.Assuming a state-space representaion, the Bayes estimator provides a method for computing the posterior (conditional) probability distribution/density of the hypothetical state   at time  given the set of measurements   = { 1 ; . . .;   } (up to time ) and the prior distribution as follows: where (i) (  |  ) is called the likelihood function and is based on the given sensor measurement model.
(ii) (  | −1 ) is called the prior distribution and incorporates the given transition model of the system.
(iii) The denominator is a merely a normalizing term to ensure that the probability density function integrates to one.
The probabilistic information contained in  about  is described by the probability density function (|), which is a sensor dependent objective function based on observation.The likelihood function relates the extent to which a posteriori probability is subject to change and is evaluated either via offline experiments or by utilizing the available information about the problem.If the information about the state  is made available independently before any observation is made, then likelihood function can be improved to provide more accurate results.Such a priori information about  can be encapsulated as the prior probability and is regarded as subjective because it is not based on observed data.The information supplied by a sensor is usually modeled as a mean about a true value, with uncertainty due to noise represented by a variance that depends on both the measured quantities themselves and the operational parameters of the sensor.A probabilistic sensor model is particularly useful because it facilitates a determination of the statistical characteristics of the data obtained.This probabilistic model captures the probability distribution of measurement by the sensor  when the state of the measured quantity  is known.This distribution is extremely sensor specific and can be experimentally determined.Gaussian distribution is one of the most commonly used distributions to represent the sensor uncertainties and is given by where  represents the sensors.Thus, if there are two sensors that are modeled using (2), then from Bayes' theorem the fused mean of the two sensors is given by the Maximum a posteriori (MAP) estimate as follows: where  1 is the standard deviation of sensor 1 and  2 is the standard deviation of sensor 2. The fused variance given by

Modified Bayesian Approach (MB).
Sensors often provide data which is spurious due to sensor failure or due to some ambiguity in the environment.The simplified Bayesian approach described previously does not handle the spurious data efficiently.The approach yields the same weighted mean value whether data from one sensor is bad or not, and the posterior distribution always has a smaller variance than the variance of either of individual distributions being multiplied.This can be seen in ( 3).The simplified Bayesian does not have a mechanism to identify if data from certain sensor is spurious and thus it might lead to inaccurate estimation.In [11], a modified Bayesian approach has been proposed which considers measurement inconsistency.Consider As shown in [11], modification observed in ( 5) causes an increase in the variance of the individual distribution by a factor given by The parameter  is the maximum expected difference between the sensor readings.Larger difference in the sensor measurements causes the variance to increase by a bigger factor.The MAP estimate of state  remains unchanged but the variance of the fused posterior distribution changes.Thus, depending on the squared difference in measurements from the two sensors, the variance of the posterior distribution may increase or decrease as compared to the individual Gaussian distributions that represent the sensor models.
The difference between the simplified and the modified Bayesian can be seen in Figures 1 and 2. In this example, there are two sensors, where sensor 1 has a standard deviation of 2 and sensor 2 has a standard deviation of 4. In the first case shown in Figure 1, the two sensors are in agreement.It can be seen that fused posterior distribution obtained from the proposed strategy has a lower value of variance than that of each of the distributions being multiplied indicating that fusion leads to a decrease in posterior uncertainty.
In the second case in Figure 2, the two sensors are in disagreement.The fused posterior distribution obtained from the modified Bayesian has a larger variance as compared to the variance of both sensors.However, the fused variance due to the simplified Bayesian was the same as the fused variance in Figure 1.This concluded, as in [11], that the modified Bayesian was highly effective in identifying inconsistency in sensor data and thus reflecting the true state of the measurements.

Geometric Redundant Fusion (GRF).
The GRF method is another method used for fusing uncertain data coming from multiple sensors.The fusing of  uncertain -dimensional data points is based on a weighted linear sum of the measurements as follows: where   is the fused value and   is the weighting matrix for measurement  and   .Applying expected values to (7) and assuming no measurement bias yields the following condition: For a given set of data, the weighting matrix, , and the covariance matrix, , are formed as follows:

𝑄 = (
where  2  is the uncertainty ellipsoid matrix for measurement .Using Lagrange multipliers the following results for the weighting matrices and fused variance are obtained: In the one-dimensional case with two measurements the fused result becomes While the GRF method handles the fusion of  measurements in -dimensional space in an efficient manner, it does not include information about the spacing of the means of the  measurements in the calculation of the fused variance.That is, the magnitude of the spatial separation of the means is not used in the calculation of the uncertainty of the fused result.Figures 3 and 4 shows fusing of two onedimensional measurements using GRF.It can be observed that the uncertainty of the GRF remained the same regardless of the separation or the inconsistency of the measurements.
Therefore, the GRF method provides a fused result with identical uncertainty independent of whether the measurements have identical or highly spatially separated means.To overcome this problem, a heuristic method was developed to consider the level and direction of measurement uncertainty as reflected in the level and direction of disparity between the original measurements.Thus, the output is no longer purely statistically based but can still provide a reasonable measure of the increase or decrease of uncertainty of the fused data.

Heuristic Geometric Redundant Fusion (HGRF).
The desired heuristic would modify the GRF result so that reliable results are produced for all ranges of measurement disparity or inconsistency.It will contain information about the separation of the mean of the measurements when finding the uncertainty of the fused result [12].The following cases show how the HGRF result changes as the separation between the measurements increase.The uncertainty of each measurement and of the fused result is shown in the form of an ellipsoid.
Figure 5 shows two measurements with no separation or consistency.For this case, the HGRF uncertainty region is equivalent to the uncertainty region generated by the GRF method.
Figure 6 shows two measurements that somehow partially agree.The HGRF uncertainty region increased relative to the GRF which is the same as in Figure 5; however, the HGRF uncertainty decreased relative to the two sensors uncertainty.
Figure 7 shows two measurements in disagreement or inconsistent.The HGRF uncertainty region is larger than the uncertainty of each sensor and thus the uncertainty increases.It can be observed that the GRF uncertainty remained the same as in the previous cases.
Figure 8 shows two measurements completely separated or inconsistent which indicates measurement error.The resulting HGRF uncertainty ellipsoid covers the entire range of the measurement ellipsoids along the dimensions of measurement error.In this case, the increase in the uncertainty indicates the occurrence of measurement error.

Comparative Study.
To evaluate the difference between the four approaches mentioned previously, a comparative study was carried out.This study considers a mobile robot moving in a straight line with a constant velocity of 7.8 cm/s.The position of the robot is tracked using two measurements coming from two sensors: optical encoder and Hall-effect sensor.To detect the position of the robot, the readings coming from the sensors are being fused using: SB and MB as well as GRF and HGRF methods, during 20 seconds of travel.The standard deviations of optical encoder and Hall-effect sensor are 2.378 cm and 2.260 cm, respectively.Therefore, the measurements coming from the optical encoder have a higher uncertainty.
Figure 9 shows the uncertainty curves of the sensors as well as the fused results at  = 10 sec.It can be observed that the variance of the GRF and SB were the same and thus the uncertainty curves were completely overlapping.However, the HGRF result followed the heuristic specification and covered the uncertainty curves of the measurements.Compared to the SB and the GRF, the MB showed a response to the separation or inconsistency of the measurements, however, not as much as the HGRF.The fused mean of the four approaches was exactly the same value of 78.4 cm, while the mean value of the optical encoder and Hall-effect sensor was 75.4 cm and 79.1 cm, respectively.The fused mean is more biased towards the accurate reading of Hall-effect sensor.
To compare the different approaches in terms of the calculation time, Figure 10 shows how the running time of each algorithm changes over 20 seconds.It can be seen that the HGRF method is always taking longer time compared to the other methods.The other approaches have approximately the same running time.It can be observed from Figure 11 that the HGRF shows a very big change in the fused variance due to its high sensitivity to measurement inconsistency.The MB also showed a response to the separation inconsistency between the measurements; however, it was not as great as the HGRF.The fused variance using both the SB and the GRF was exactly the same as seen by the perfectly overlapping curves, and the variance value did not change throughout the simulation time.
The error curves in Figure 12 show that the fusion result is generally more accurate with less errors compared to the error caused by the uncertainty of each measurement.The   optical encoder shows a higher range of error due to its higher uncertainty than the Hall-effect sensor.
Table 1 summarizes the main differences between each of the four multisensor data fusion approaches.In general, the MB outperforms all the other approaches in terms of accuracy, time, and variance change.

Proposed Approach
The proposed approach to multisensor data fusion in this paper relies on combining a modified Bayesian fusion algorithm with Kalman filtering.The main reason for using the modified Bayesian fusion is its efficient performance that is  proved in the previous section.Three different techniques, namely, prefiltering, postfiltering and pre-postfiltering are described in the following subsections based on how filtering is applied to the sensor data, to the fused data, or both.

Modified Bayesian Fusion with Prefiltering (F-MB).
The first proposed technique is the prefiltering (F-MB) which involves adding Kalman filters before the modified Bayesian fusion node.As illustrated in Figure 13 and shown in Algorithm 1, Kalman filter is added to every sensor to filter out the noise from the sensor measurements.The filtered measurements are then fused together using modified Bayesian to get a single result that represents the state at a particular instant of time.

Modified Bayesian Fusion with Postfiltering (MB-F).
The second proposed technique is the postfiltering (MB-F) which involves adding a Kalman filter after the fusion node in order to filter out the noise from the fused estimate as shown in Algorithm 2 and illustrated in Figure 14.The second proposed technique is to add Kalman filter after the fusion node which fuses the measurements using modified Bayesian to produce  int .Kalman filtering is then applied to the fused state  int in order to filter out the noise, as shown in Algorithm 2 and illustrated in Figure 14.The output of the Kalman filter represents the state   at a particular instant of time as well as the variance of the estimated fused state   .

Modified Bayesian Fusion with Pre-and Postfiltering (F-MB-F).
In this technique, Kalman filter is applied before and after the fusion node as illustrated in Figure 15.The algorithm of this technique is the integration of Algorithms 1 and 2, as shown in Algorithm 3.

A Case Study: Mobile Robot Local Positioning
In this section, a case study of position estimation of a mobile robot is presented to proof the efficacy of the proposed approach.Mobile robot positioning provides answer for the question: "Where the robot is?" Positioning techniques solutions can be roughly categorized into relative position measurements (dead reckoning) and absolute position measurements.In the former, the robot position is estimated by applying to a previously determined position the course and distance traveled since.Later, the absolute position of the robot is computed from measuring the direction of incidence of three or more actively transmitted beacons or using artificial or natural landmarks or using model matching, where features from a sensor-based map and the world model map are matched to estimate the absolute location of the robot [13].

Simulation.
To test the proposed approaches, a simulation was carried out using MATLAB, where it was required to locate the position of the robot by finding its x and y coordinates.It was assumed that two sensors were used to find how much the robot has moved in the x and another two sensors to find how much the robot has moved in the y.The sensors used in the simulation were assumed to have uncertainty modelled as a white Gaussian noise.The standard deviations of the sensors used to find the x-coordinates of the robot are 4.3 cm and 6.8 cm.However, the standard deviations of the sensors used to find the y-coordinates are 4.5 cm and 6.6 cm.In addition, it was assumed that the robot is moving with a speed of 7.8 cm/sec in the x and 15.6 cm/sec in the y.The sampling time is 0.5 sec and the robot was simulated to move for 20 sec.

Sensor-n
Figure 15: Modified Bayesian fusion with pre-and postfiltering.

Evaluation Metrics.
The performance of the algorithms was evaluated based on five criteria.

CPU Running Time
. This represents the total processing time of the algorithm to estimate the position of the robot throughout the travelling time.It is desired to minimize this running time.

Residual Sum of Squares (RSS)
. This represents the summation of the squared difference between the theoretical position of the robot and the estimated state at each time sample.The smaller the RSS, the more accurate the algorithm will be because this means that the estimated position of the robot is getting closer to the theoretical position.This is given by (pos theoretical, − pos estimated, ) 2 . (12)

Variance (P)
. This represents the variance of the estimated position of the robot.The variance will reflect the performance of the filters in each algorithm.

Coefficient of Correlation
. This is a measure of association that shows how the state estimate of each technique is related to the theoretical state.The coefficient of correlation will always lie between −1 and +1.For example, a correlation close to +1 indicates the two data are very strongly positively correlated [14].

Criterion Function (CF).
A computational decision making method was used to calculate a criterion function that is a numerical estimate of the utility associated with each of the three proposed techniques.A weighting function  (from 0 to 1) will be defined for each criterion (time, RSS, and variance), depending on its importance.The three weights should sum up to 1.The cost value  (calculated from the experiments) of each technique is obtained and finally CF is calculated as the weighted sum of the utility for each technique as follows: where  1 ,  2 , and  3 are the weights of the time, RSS, and variance, respectively.These weights are adjusted according to the application and desire of the user.In this case study, it was assumed that  1 = 1/3,  2 = 1/3, and  3 = 1/3.The values  1 ,  2 , and  3 are the values obtained from the experiments for the time, RSS, and variance, respectively.The values  1 max ,  2 max , and  3 max represent the maximum value achieved in each of the criteria: time, RSS, and variance, respectively.The objective is to minimize this function such that the algorithm will produce accurate estimates in a short time with a minimum variance.

Results and Discussion
. A simulation of 5000 iterations was carried out to compare the results of the proposed algorithms using MATLAB.Figure 16 shows the errors that are produced due to the noisy uncertain measurements obtained from the sensors.In addition, the figure shows the errors that are produced due to estimating the x-coordinates of the robot using the proposed three algorithms.The error represents the difference between the theoretical state and the output state of each algorithm at a particular sample time.The measurement errors are high compared to the estimation errors.Figure 17 shows the errors that are produced due to the noisy uncertain measurements obtained from the sensors as well as the errors of the proposed three algorithms to get the y-coordinates of robot.Similar to the results shown in Figure 16, the measurement errors are high compared to the estimation errors.Thus, the first interpretation of these results is that the proposed techniques provide better estimates than relying on the measurements directly.
Table 2 summarizes the average results of 5000 runs for the three evaluation metrics described previously.The minimal value in each criteria has been bolded.Although MB takes the minimal execution time yet it has the maximum RSS value compared to other techniques and this is clear in the CF shown in Table 2. Figure 18 shows the CPU running time of each algorithm.It is clear that MB takes the shortest calculation time while F-MB-F takes the longest time.
Figure 19 shows the estimated variance of each technique.It is clear that the variance of the MB-F and F-MB-F converged earlier to lower values than MB and F-MB.This proves the efficiency of the Kalman filters in MB-F and F-MB-F.Table 2 shows that F-MB-F has a smaller variance than MB-F.Moreover, the proposed techniques estimates were all positively and strongly correlated to the theoretical values with a correlation coefficient of +0.99.
It can be seen from Figure 20 that MB-F has outperformed the other techniques, followed by F-MB-F.The least performance was observed in MB and thus this proves that combining fusion with Kalman filtering improves the estimation of the states.

Conclusion
Three techniques for multisensor data fusion have been discussed in this paper.These techniques combine a modified Bayesian data fusion algorithm with pre-and postdata filtering in order to handle the uncertainty and inconsistency problems of sensory data.A simulation has been carried out to estimate the position of the robot by applying the proposed techniques to find the x and y coordinates.The simulation results proved that combining fusion with filtering improves residual sum of squares and variance of the fused result.For future research, the case study presented in this paper will be applied for accurate real-time 2D localization of a newly built differential drive mobile robot equipped with optical encoders and Hall-effect sensors for relative position measurements.

Figure 10 :
Figure 10: Time taken to perform each algorithm throughout simulation time.

Figure 11 :
Figure 11: Trend of the fused variance throughout simulation time.

Figure 16 :
Figure 16: Measurements and estimated errors to find the x-coordinates of the moving robot.

Figure 17 :
Figure 17: Measurements and estimated errors to find the y-coordinates of the moving robot.

Figure 18 : 2 )Figure 19 :
Figure 18: Time taken to perform each algorithm throughout simulation time.

Table 1 :
Comparison between different fusion techniques.