VSMURF : A Novel Sliding Window Cleaning Algorithm for RFID Networks

Radio Frequency Identification (RFID) is one of the key technologies of the Internet of Things (IoT), and is used in many areas, such as mobile payments, public transportation, smart lock, and environment protection. With the development of RFID technology, RFID systems have been deployed in different applications at a large scale. However, the performance of RFID equipment can be easily affected by the surrounding environment, such as electronic productions and metal appliances. These can impose an impact on the RF signal, which makes the collection of RFID data unreliable. Usually, the unreliability of RFID source data includes three aspects: false negatives, false positives and dirty data. False negatives are the key problem to be solved in this paper, as the probability of false positives and dirty data occurrence is relatively small. This paper proposes a novel sliding window cleaning algorithm called VSMURF, which is based on the traditional SMURF algorithm. The proposed method combines the dynamic change of tags and the value analysis of confidence (δ). Experimental results show that VSMURF algorithm performs better in most conditions, and when the tag’s speed is low or high. In particular, if the velocity parameter is set to 2m/epoch, which represents a typical situation, our proposed VSMURF performs better than SMURF. The same results apply when the tag is moving fast. The results also show that the proposed VSMURF algorithm has better performance than other algorithms in solving the problem of false negatives for RFID networks.


Introduction
The Internet of Things (IoT) [1] is defined as the set of objects that can communicate over the Internet.Cloud computing is an Internet-based computing service that can provide task allocation [2] and secure data services [3] to IoT devices on demand [4].This has led IoT to be a global technology used in many fields, such as ubiquitous cities [5].With the development of these systems, data process technology has attracted many researchers' interest.As an example, to meet the secure requirements of ubiquitous cities, Shen et al. have proposed a sharing framework based on attribute-based cryptography to support dynamic operations for urban data [5].Radio Frequency Identification (RFID) is considered one of the key technologies to realize IoT and is widely used in many areas, such as mobile payments, public transportation, health monitoring, environment protection, and smart city [6,7].More and more big data are generated by those types of RFID and Cloud-based systems, which make the spatial-temporal data management a current hot topic [8].RFID uses radio frequency signals through wireless communications to achieve noncontact transmission of information and to perform automatic identification of objects attached with RFID tags.The advantage lies in the fact that the RFID tags and readers can perform identification without physical contact.RFID system can be divided into the following three components: readers, tags, and back-end computer system.The reader and tag communicate through an antenna.To this end, the reader firstly emits electronic signals through an antenna, and then the tag emits identification information of internal storage after receiving the signal.Secondly, the reader receives and identifies the information sent back from the tag via an antenna.Finally, the reader sends the identification results to the computer system [9].The working principle of RFID system is shown in Figure 1.
In early RFID applications, the reader is directly connected to the application, and RFID data will be processed as logical data by the application.This data processing approach meets the needs of earlier RFID systems, but the system's design is complex and the efficiency is low.There is an important problem with these legacy systems; in particular reusability is difficult.To reduce the complexity and meet the requirements of the rapid development of RFID technology, RFID middleware [10] systems were introduced.Middleware system works as a separate level which can perform some of the data processing and shield the reader hardware and the upper application system.In these scenarios, the program's scalability and applicability are greatly improved.RFID middleware can be used to clean, filter, and format the data collected by the readers and transfer the processed data to the back-end applications [11].This paper mainly studies the data preprocessing method, which is also referred to as data cleaning method in RFID middleware system.
Traditional data cleaning technology [12] is mainly applied in three fields: (1) data warehouse (DW or DWH); (2) Data Discovery or Knowledge Discovery in Data (KDD) [13]; (3) Total Data Quality Management (TDQM) [14].In these areas, data cleaning is an integral and essential part for data processing.However, traditional data cleaning technology is only suitable for relational database or data warehouse technology.The RFID data flow is different from the traditional relational database and the data stream generated in data warehouse.The interaction between the reader and the tag in the RFID system determines the following characteristics of RFID data [15,16]: the source data has a simple structure that holds the characteristics of flow, batch, magnanimity, temporal, dynamic change, correlation, and unreliability.The way in which the data is generated by the RFID device determines that the source data is often seriously unreliable [17,18].The main reasons for unreliability of data in RFID system are as follows [19]: (1) False positives: tags' data which should not be recognized by the reader for some reason (noise, electromagnetic interference, and the like) have been read.
(2) False negatives: tags' data that should be recognized by the reader have not been correctly read.
(3) Dirty data: reader detects the tag that exists in its reading range, but the data which is read by the reader contains errors.
False positive and dirty data are fortuity, and the probability of their occurrence is relatively small.Instead, the false negative phenomenon is more common and is the main reason of causing unreliability in RFID system.Therefore, in order to improve the quality of data and ensure that the upper application works effectively, RFID source data cleaning operation is needed.Previous researches aimed at RFID data cleaning technology have done extensive and in-depth study and proposed some classic data cleaning algorithms.This paper proposes a novel data cleaning algorithm (named as VSMURF), which is based on the traditional SMURF algorithm in RFID networks, which is aimed at reducing the number of false negatives.
The paper is organized as follows: Section 2 describes the related work and the SMURF algorithm.Section 3 presents the proposed VSMURF algorithm.Section 4 describes simulation and experimental results.Finally, Section 5 summarizes the paper.

Related Work and SMURF Algorithm
In this section we describe the related work on data cleaning technology and recap the SMURF algorithm.[20] refers to a process that can detect and correct the identifiable error in data files.Some classical data cleaning algorithms exist in the literature, which will be introduced in this section.

Data Cleaning Technology. Data cleaning
The data cleaning method based on sliding window is a typical and commonly used method.For example, Bai et al. proposed a method based on a fixed sliding window [21], where the window is fixed and moves forward over time.The cleaning process is shown in Figure 2. Here, the raw data is generated by reader.
Jeffery et al. proposed a scalable data cleaning framework [22] based on Extensible receptor Stream Processing (ESP), which is a declarative query processing tool that is easily pipelined to deploy the configuration to each recipient.The framework is capable of cleaning data from various features of different readers.ESP pipelined structure is processed into the following five consecutive programmable stages: Point, Smooth, Merge, Arbitrate, and Virtualize.Gonzalez et al. proposed a dynamic data cleaning method based on Dynamic Bayesian Networks (DBNs) [23], which dynamically adjusts the trustworthiness state (which is the probability of tag existence).Song of Liaoning University proposed a three-tier structure Kalman Filter-Based RFID Cleaning (KFBC) [24] to address the problem of false negative in RFID system.As shown in Figure 3, this is a Kalman filter update process, which consists of time update and measurement update, and it is an autoregressive process.

SMURF Algorithm.
Here, we firstly recall the basic concepts of the SMURF algorithm [25], and then we describe the algorithm in detail.
Interrogation Cycle.It is an inquiry and answer-response process between a reader and a tag, which is the basic reader's protocol that tries to detect all tags by the reader.

Sliding window
Cleaning result Reading Cycle.It is a set of multiple interrogation cycles.It is also named as epoch, the value of which is from 0.2 seconds to 0.25 seconds [25].In each epoch, the reader keeps the recording of how many and which tags are identified.
Read Rate  , .It is the probability of tag  to be read by reader at  epoch, which is calculated as where The SMUR algorithm works as follows.In each epoch, the reader detects all tags in its reading range and stores the information of tags into the tag list, including TagID, Responses, and Timestamp, as shown in an example in Table 1.The list can be represented by a triple (TagID, Responses, and Timestamp), and all those pieces of information are transmitted to the reader's client at regular intervals.
Each epoch is treated as an independent Bernoulli experiment.In detail, the probabilities of tag  appearing in the window size (  ) at each epoch are all the same and can be calculated using the formula: In addition, the following two formulas are the most important ones in the SMURF algorithm: where  denotes confidence in formula (3).Confidence method can be used to process and analyze various data, such as image reconstruction.In the literature [26], authors use the confidence method to remove the noise from image according to the confidence of the depth to compute the range map.Experimental results show that the confidence method obtains good effectiveness in data processing.In SMURF algorithm, confidence is used to meet the integrity requirements of the data and ensure that the tag is successfully read in the sliding window   according to formula (3).Formula ( 4) is obtained by the central limit theorem, which is the control condition for setting the current sliding window size and the conditions of the tag's dynamic change.A reasonable sliding window size needs to balance the integrity requirements with the tag dynamic changes.Determining whether to output data is decided according to the data in the window in SMURF algorithm.In addition, determining how to adjust the window size is done according to the requirements of integrity and dynamic detection.Therefore, SMURF is a type of single tag's data cleaning algorithm.SMURF algorithm is composed of the following steps.
Step 1.The initial setting of   is 1.  is the tag's pool and  *  is calculated according to formula (3).
Step 2. If  *  >   , adjust the window size according to the following formula: Step 3. If  *  ≤   , the dynamic change state of tag is detected according to formula (4).If the tag's state is changed, the window size needs to be adjusted according to the following formula:

Limitations of SMURF Algorithm.
In the RFID middleware system, the sliding window filtering method is the key technology to reduce false negatives.However, there are some shortcomings in SMURF algorithm: firstly, the size of sliding window is difficult to set in the case of dynamic movement of tags.If the window is set too large, it will generate false positives in the process of filling data; and if the window is set too small, it will not be able to completely fill the data which results in false negative.The selection of the optimal value of sliding window size directly affects the effectiveness of the algorithm.In addition, the effect of this cleaning algorithm is remarkable only when the RFID data stream is in an ideal condition, which means that the RFID tags move at a uniform speed.However, this situation does not happen in some conditions, for example, when the external environmental factors change (such as the tag moves in or out at the reader's detection range rapidly).In these cases, the performance of the cleaning effect is greatly reduced.In addition, SMURF algorithm's confidence is just based on empirical evaluation; that is, it is not the result of a specific analysis.If the tag quickly leaves the reader's detection range, the value  has a great impact on the results [20].Specifically, formula (3) shows how the sliding window size (  ) is related to the two parameters  avg  and .In fact,   is inversely proportional to  avg  , which means that the performance of the SMURF algorithm decreases rapidly when the tag moves out or into the reader's detection range.Assuming a set of raw data streams (0.5, 0.6, 0.6, 0.6, 0.5, 0.8, 0.8, 0.0, 0.4, and 0.8), the initial window size (  ) is set to 5 epochs, where each epoch contains 10 interrogation cycles, and  is set to 0.05.Hence, after calculation,  avg  is 0.56.Because the sliding window size (  ) is 5 epochs, the processing tag has been moved out from the reader's detection range in the third epoch of the second window.In the end, the tag is still in reader's detection range according to SMURF algorithm, which results in false positives in RFID network.Therefore, SMURF algorithm fails under this condition.For these reasons, we have designed a novel algorithm to take into account the limitations of SMURF, which is described in the next section.

VSMURF: A Novel Sliding Window Cleaning Algorithm for RFID Networks
Based upon the previous analysis, we propose a novel algorithm to address SMURF algorithm's limitations, which is called VSMURF.In the following, we firstly describe the dynamic detection mechanism of tags and analysis of confidence.Then, we detail our adaptive sliding window cleaning algorithm.The dynamic detection mechanism of a tag's movement is divided into three parts.
Mechanism 1.In order to determine when the tag leaves the reader's detection range, the algorithm introduces the sliding window  2 (| 2 |) based on formula (4).The Least Squares Method [27] is used to fit the slope of the curve to determine whether the tag is moving out.If the slope is negative, the tag is moving out from the detection range.In addition, our proposed algorithm reduces two epochs of sliding window size.The purpose of this is to reduce the occurrence of false negatives.
Mechanism 2. The sliding window will be reduced by half if the tag is moving out and cannot be detected in the sliding window  2 (| 2 | = 0).Mechanism 3. The sliding window size is increased if  *  , which is calculated according to formula (3), is larger than the current window   ( *  >   ) and the actual number of readings is greater than the expected number of readings The above three mechanisms are used to adaptively adjust the size of the tag's sliding window.The tag's dynamic detection still uses the central limit theorem (CLT) [28].In the initial state, all tags which need to be detected are set to 1 epoch and the window size is adjusted to 3 epochs.The goal is to balance the efficiency of the proposal algorithm and reduce false positives.Similar to the SMURF algorithm, the proposed VSMURF algorithm outputs the results at the middle of window and slides once at each epoch.

Confidence 𝛿 Analysis.
In the SMURF algorithm, the formula (1 −  avg  )   <  is the integrity requirement to ensure that the RFID data stream is completely covered, but it does not cover the particular value of  that should be selected.The SMURF algorithm only provides an empirical value.When the value of  is less than 0.5, it has a little effect in the sliding window size [25].In fact, the SMURF algorithm cleans the data significantly when the tag moves slowly and does not move in and out frequently in the reader's detection range.But when the tag moves quickly, the algorithm's error rate becomes high.The reason is that the sliding window size becomes larger when the confidence  is smaller, which results in an increase in the number of false positive readings and a corresponding increase in reading error rate.In order to solve the above problems, this paper takes into account the following factors that affect the cleaning results: the reader's detection range (), reading frequency (), the tag's speed (), and confidence ().All these factors dynamically affect the sliding window size's adjustment.The maximum number of epochs is determined by 2/(/) = 2/ when the tag passes through the center of the circle (as shown in Figure 4).
When the tag is very close to the antenna of the reader reading range, the number of readings is  = /.To improve the efficiency of data cleaning, VSMURF algorithm dynamically adjusts the parameter  ( =  avg  /).Therefore, we obtain the following formula: By applying the natural logarithm on both sides of the equation, we get This is because ln (1 − ) ≤ −,  ∈ (0, 1) .
Finally, we obtain Thus, the integrity requirement is adjusted to formula (10).
The maximum threshold  of  can ensure that the error rate is less than 10% [20].

Adaptive Sliding Window Cleaning Algorithm VSMURF.
VSMRUF is an improved single tag's sliding window cleaning algorithm based on SMURF algorithm and is based on the adjustment process of tag  at each epoch.The algorithm is composed of seven steps, which are described in detail in the following.
Step 1. Initialize the reader's detection range (), reading frequency (), confidence (), and threshold of tag's maximum speed and set the initial window to 1 epoch.When  > ,  is increased appropriately ( < 0.25).Otherwise,  is reduced appropriately.
Step 2. Detect whether reading cycle (epochs) ends, and if it ends, then the algorithm ends.
Step 3. Calculate the minimum value that satisfies the integrity requirement according to formula (3) and determine whether the tag is removed by using least squares.
Step 4. If the tag is moving out and | 2 | = 0, the sliding window size is adjusted to the half according to Mechanism 2. Then adjust the window size according to formula (11) and enter the next epoch: Step 5. Check whether the window is modified according to formula (4).If the window transforms, adjust the window size according to formula (12) and enter the next epoch: Step 6.If the current window size does not meet the integrity requirement ( *  >   ) and the window does not convert (|  | >   ⋅  avg  ), according to Mechanism 3 and to formula (13) increase the window size and then enter the next epoch: Step 7. If the current window satisfies the integrity requirement (  ≥  *  ) and the window transforms (|  | ≤   ⋅ avg  ), the window size will not change.
Figure 5 shows the data process flow of VSMURF algorithm.
The pseudocode of the VSMURF algorithm is given in Pseudocode 1.

Reader's Detection Model and Evaluation Mechanism.
In our experiments, the detection range of the reader is divided into the major detection range and the minor detection range [29] (as shown in Figure 6).
The reader detection model has the following characteristics: (1) The reader has a high detection probability and the reading rate is higher than 95% in the major detection range, which is near to the reader.
(2) The area that extends from the end of the major detection range to the end of the reader's detection range is called minor detection range.Here, the reading rate is linearly reduced to 0.
The model uses the following parameters to capture the behavior of readers under different conditions:

No
The window is halved via Mechanism 2. Then formula (11) was used to adjust window size.

Yes
Does the tag transform?
Adjust the window size according to formula (12).

Yes
Adjust window size via Mechanism 3 and use formula (13) to increase the window size.

AvgPositives = ∑
NumEpochs  (FalsePositives  ) NumEpochs , where NumEpochs is the number of reading cycles; FalsePositives and FalseNegatives are the numbers of false positive readings and false negative readings, respectively.The evaluation mechanism includes two types of errors, which can effectively evaluate the performance of various algorithms.
The reading rate  is defined as in the following formula: where  is the distance between the tag and the reader,  1 and  2 are the reader's major and minor detection range, and  ℎ and   are the reading rates of the major and minor detection range, respectively.(ii) RFID Reader is Speedway R420 reader.
Figure 7 shows the RFID reader, reader's antenna, and tags used in our experiments.VSMURF algorithm belongs to the single tag cleaning algorithm; hence, in the experiments, only one reader is used and the number of tags is set to 25.The experimental model of VSMURF algorithm is shown in Figure 8.At some point, when the tag moves into the reader's detection range,  is calculated on the current coordinate point.The value of  is determined by the distance between the tag and the reader using formula (15).In addition, the system generates a random number  (the value of is  between 0 and 1), where  is used to determine whether the tag's information is generated.If  ≥ , the tag is read; if  < , then the tag is false negative reading.

Results and Discussion
. In this section we describe the experiments that we have performed.The results are shown in Figure 9.In detail, when MajorPercentage is 0 (such as in a very noisy environment), the large window performs better, which shows about 4 errors per epoch.As MajorPercentage increases, the reliability of the original data and the accuracy of each algorithm are improved.The small window performs better when MajorPercentage reaches 1.For example, the major detection range covers the entire detection range of the reader.In this experiment, the VSMURF cleaning algorithm has a noticeable efficiency improvement compared with the In order to simulate realistic conditions in terms of tag and reader behavior, we have implemented the SMURF and VSMURF algorithm using R420 RFID readers.In the experiments, we have collected 12 sets of real data.Due to the number of reading cycle, which is set to 1000, the time of data cycles is equal 1000.In the experiments, each set of data needs to record the execution time.The time result is shown in Table 3. From Table 3 and Figure 9, we can see that when the tag speed is large and the tag is moved out of the reader detection range frequently, the execution time of VSMURF is slightly higher than that of the SMURF algorithm because the VSMURF algorithm adds the confidence judgment.However, the average error rate of VSMURF is lower than that of SMURF algorithm.
In addition, the overhead of VSMURF algorithm with respect to traditional fixed window and SMURF algorithm is minimal because our VSMURF algorithm only utilizes simple mathematical operations.MajorReadRate is 0.8, and the number of reading cycles is 1000 epochs.The fixed window cleaning algorithm, SMURF algorithm, and VSMURF algorithm are compared in this experiment.The average error rate of each algorithm is compared at different speeds.The parameter settings are shown in Table 4.
As shown in Table 5 and Figure 10, all algorithms work well when the tags' speed is less than 0.5, because the tag is in a stable environment at this speed.Static 10, which selects the large window to eliminate more false negative readings, obtains the best performance when the tag is static, and the number of errors per epoch is less than 1.With the tag's speed increase, the average number of errors for static 10 starts to increase and the efficiency begins to decline.This is because the large window causes more false positive readings.In addition, the small window static 2 cannot completely make up for false negative reading errors.However, it will not cause a lot of false positive readings, and the overall performance is better.SMURF performs well when the tag's speed is small.As the speed increases, that is, the tag moves in or out the detection range frequently, the efficiency of the algorithm is seriously degraded.VSMURF algorithm works better in most conditions and, regardless of the fact that a tag is at low or high speed, the average number of errors is the lowest in all the compared algorithms, as confirmed by the results reported in Table 5 and Figure 10.

Conclusion
This paper has presented a new sliding window cleaning algorithm VSMURF, which is based on the tag's dynamic property and confidence.The paper is based on the observation that the SMURF algorithm performs well only when the tag's speed is slow.However, if the tag's speed is increasing and the tag moves into or out the detection range frequently, the efficiency of the SMURF algorithm declines dramatically.
To address this limitation, we have introduced VSMURF algorithm and we have shown that it performs better in most conditions and whether the tag's speed is low or high.
In particular, if the velocity parameter is set to 2 m/epoch, which represents a typical situation, our proposed VSMURF performs better than SMURF.The same results apply when the tag is moving fast.The experimental results show that the proposed method is better suited than other similar algorithms in reducing the problem of false negatives.As a future work, we intend to implement VSMURF also in RFID middleware system.

Figure 1 :
Figure 1: Working principle of RFID system.

Figure 2 :Figure 3 :
Figure 2: Data cleaning method based on fixed sliding window.

3. 1 .
The Dynamic Detection Mechanism of Tags.Assume that the current window size of tag  is   = ( −   , ],   is divided into two parts, where the first part is denoted as  1 = ( −   ,  −   /2] and the second part is denoted as  2 = [ −   /2, ], and the binomial distribution samples are, respectively, denoted as | 1 | and | 2 |.
(i) Detection range: the distance between the reader and the boundary of the reader (ii) Percentage of major detection range (MajorPercentage): the major detection range of the reader accounting for the percentage of the entire detection range Start Initialization (R, f, w, etc.)

Figure 10 :
Figure 10: Results of average number of errors of compared algorithms when tags' speed changes.
.It denotes the subset of tags observed by the reader during that epoch, which can be expressed by   ⊆   .In the general case, it is assumed that tag  is seen in a subset of all the epochs in   .
is the number of interrogation cycles at  epoch, and Responses is the number of responses of tag  at  epoch in the reader's reading range.Sliding Window Size   .The value denotes   epochs in which tags can be identified by readers.It can be expressed by   = ( −   , ].

Table 1 :
List of tags.

Table 3 :
The execution time of SMURF and VSMURF algorithm.

Table 5 :
Comparison of average errors in different speeds.