A Lossless Compression Approach Based on Delta Encoding and T-RLE in WSNs

The sending/receiving of data (data communication) is the most power consuming in wireless sensor networks (WSN) since the sensor nodes are depending on batteries not generally rechargeable characterized by limited capacity. Data compression is among the techniques that can help to reduce the amount of the exchanged data between wireless sensor nodes resulting in power saving. Nevertheless, there is a lack of effective methods to improve the efficiency of data compression algorithms and to increase nodes’ energy efficiency. In this paper, we proposed a novel lossless compression approach based on delta encoding and two occurrences character solving (T-RLE) algorithms. T-RLE is an optimization of the RLE algorithm, which aims to improve the compression ratio. This method will lead to less storage cost and less bandwidth to transmit the data, which positively affects the sensor nodes’ lifetime and the network lifetime in general. We used real deployment data (temperature and humidity) from the sensor scope project to evaluate the performance of our approach. The results showed a significant improvement compared with some traditional algorithms.


Introduction
Wireless technologies offer new perspectives in the field of telecommunications and computer networks. As a result of the progress made, a new type of ad hoc network has emerged, which is called wireless sensor networks [1]. These are networks without fixed infrastructure; they can be deployed quickly in sensitive areas and/or difficult to access. Their mission is most often to monitor a space, to take regular measurements, and to raise alarms to specific nodes of the network, known as collecting nodes (sink), capable of relaying information on a large scale to the remote control center [2]. The intrinsic characteristics of this new generation of microsensors (tiny, processing capacity, wireless communication, low cost, diversity of sensors (optical, thermal, multimedia, etc.)) have opened up new and varied application perspectives for sensor networks in many fields (military, home automation, environmental, etc.) [3]. However, they raise in the same proportions, many research problems as much by the potential applications that they suggest by the various constraints that they impose. It is widely accepted that the techniques and approaches developed in both wired and wireless networks are not directly "transferable" in the field of sensor networks [4]. Ultimately, WSNs remain an open domain for the scientific community, and it is almost at all levels: autoconfiguration, localization, coverage, deployment, communication, dynamic topology, collection, and dissemination of data, queries, and processing. Most of the works on sensor networks are aimed at reducing or at least rationalizing energy consumption [5]. Indeed, the networks of sensors are intended, in most cases to record information in a hostile environment or challenging to access without any human intervention. Therefore, it is difficult to find another source of energy than batteries [6]. It is for this reason they are considered as autonomous devices. Their lifetime is equal to the life of their battery, which indicates that energy consumption is a critical constraint in sensor networks since energy is considered a valuable resource [7]. The creation of several methods in energy conservation was an outgrowth of controlling the energy consumption of sensor node components, which include data compression [8].
The sensor nodes in the WSNs consume energy during the acquisition, processing, and transmission phases. However, the power spent by a node in the communication module for transmitting and receiving data is higher than that required for processing. An important approach to conserving energy and maximizing network lifetime is the use of efficient data compression systems. Data compression methods reduce the size of the data before transmitting it over the wireless channel, which result in a reduction in total power consumption. These energy savings due to compression translate directly by extending the lifetime of nodes and the network in general. Saving a byte of data through data compression has been shown to provide between four thousand (using Chipcon CC2420 transceiver) and two million (using MaxStream XTend transceiver) cycles of computation [9].
In general, the WSNs are used to monitor natural phenomena, which constitutes the correlation between each consecutive observation of the sensor nodes, especially the weather measurements (temperature and humidity). In this work, we will present an optimized lossless compression approach based on delta encoding and modified RLE algorithms (T-RLE). The main idea about this method is to increase the similarity of the data by applying delta encoding and compress it by using the T-RLE algorithm. We organized the rest of the paper as follows: we present some related works in Section 2. For Section 3, it discusses the proposed method; concerning Section 4, we analyze the result. Lastly, we conclude the paper in Section 5.

Related Works
This section aims at presenting a state art of the works dealing with the conservation of energy using compression algorithms in this particular type of network. Lossless compression is divided into three techniques, firstly, dictionarybased like Lempel-Ziv-Welch (LZ77) and Lempel-Ziv 77 (LZW). The LZ77 algorithm uses a horizon to find the matches that will help with compression. However, this method has been improved with the arrival of the LZ78 algorithm, which uses an auxiliary data structure to store the repetitive sequence; this structure calls a dictionary. This time to represent a repeated sequence, we use the index of this sequence in the dictionary not the positions. Then, we will have tuples in the form <index, &> where & represents the character that follows the repeated sequence [10]. LZW is another improvement of the LZ78 algorithm; it has eliminated the use of tuples; it only uses the indexes of sequences in the dictionary <index> [11]. On the other hand, LZW has some problems that affect the performance efficiency. It should introduce additional elements in the dictionary that can be used. In this situation, the dictionary can be in the full state, and the research within the dictionary consumes time, secondly, entropy-based such as Huffman encoding [12]. Huffman's method briefly is to find the number of occurrences of each sequence, then assign a shorter code to the most frequent sequences and a longer code for the less frequent sequences. Thirdly, the probabilistic-based, which is the combination of the two earlier techniques as GZIP and Lempel-Ziv-Markov chain Algorithm (LZMA) [13]. Due to limited processing and storage resources of sensor nodes, only a few research papers have discussed the possibility of embedding lossless compression algorithms into WSNs [14]. The direct implementation of these algorithms was not applicable, and they must be adjusted to fit the requirements of WSNs [15]. For example, the adaptation of LZW for sensor networks, called S-LZW (LZW for sensor nodes), allowed the evaluation of a data compression algorithm based on a dictionary for WSNs. However, the construction of a data dictionary is a memory-intensive process, which is an obstacle to the use of such techniques on sensors with limited memory. Therefore, to adapt LZW to the sensors, a new morphology of this algorithm is articulated around three main correlated points which are the size limitation of the dictionary, the limitation of the size of the data to be compressed, and the procedures to follow when the dictionary is filled [16]. Marcelloni and Vecchio proposed an efficient lossless entropy compression (LEC) algorithm [17] in which exploits the high temporal correlation that exists between the consecutive readings gathered by the sensors. LEC is one of the earliest algorithms that deal with multiple data types. The application of the algorithm on real datasets showed good results, which reached over 70% in terms of the compression ratio. Still, it is matched by the imperfection of the energy saving. On the other hand, the algorithm has a shortcoming that can not adapt to the changing in the correlation of the sensor reading. It needs prior knowledge of the statistical characteristics of the data like entropy. These shortcomings were the motivation for the emergence of an adaptive lossless data compression (ALDC), which presented by Kolo et al. [18]. It gives several code options to achieve the optimal performance. But regarding the compression efficiency, it has a low performance comparing to LEC. They also introduced a lightweight algorithm called FELACS (fast and efficient lossless adaptive compression scheme) [9]. It bases on RICE coding to compress the block of data, and the adaptivity is added by giving the two-block size option and variable compression ratio. FELACS has high robustness against loss of data in addition to low memory usage. It also does a single sensing attribute which results in reduce collision and retransmissions. A lossless neighborhood indexing sequence (NIS) algorithm has been proposed for the data compression in WSN in [19]. NIS uses a traversing method based on 0's and 1's. This method gives two codewords for each character, then chooses the shortest one to present each character. The authors claim that it is highly robust to packet losses since every individual packet is decompressed independently.
The main common point between these related works that they used is the same real-world datasets to test their algorithms against, as we used it from our side too. Before applying their compression algorithms, they converted the physical measures of humidity and temperature into raw_h and raw_t by using the inverted versions of the conversion functions in the sensor scope project web site [20]. The only difference is that we have not converted the data to the raw data. We used it directly as it published the sensor scope project web site, as we did in our previous work [21]. This later 2 Wireless Communications and Mobile Computing was built based on several datasets taken from real-world datasets from the sensor scope project. All the datasets are a float-point dataset. The main idea of this work is to compress the float part and the integer part separately, using RLE and Bzip2 for the integer part and Bzip2 only for the float part. The result performance shows a significant improvement compared with other algorithms. Compared to the conventional compression approaches, we find the unique characters of temperature and humidity that the repetition of integer parts appears more frequently in the used datasets. The new approach aims at dealing with this kind of real datasets based on a modified RLE (T-RLE). T-RLE has the ability to deal with one and two character occurrences, which have no reduction in the size of the data when applying RLE. In this situation, the data compression becomes an overloaded problem which consumes more energy of the network nodes without benefits. The results show the benefits of the new approach which represented by the compression ratio compared with other algorithms, which means less space storage, less transmission time, less bandwidth, and less energy consumption.

Proposed Method
The proposed method is an optimization of our recent work. In this work, we kept the same main idea of the previous work which is splitting the integer and the float parts and compressed them with different methods. The improvement we have made in this work is summarized in the following points: First, we used the delta encoding algorithm in the preprocessing step to make the data more convergent, resulting in increasing the similarity rate of the data that will be proved later.
Second, we will apply a proposed modified RLE (T-RLE) in the integer part that overcomes the variables that have two occurrences, which we have noticed in abundance in the previous work. Figure 1 is a flowchart of the proposed method, followed by the description of the innovative phase.
3.1. Delta Encoding. Delta encoding, also known as relative encoding, is one of the lossless compression algorithms, which has a simple concept that works on counting the difference between the successive samples [22], as shown in We used this algorithm to process the new data (the difference between the successive samples) instead of the original data. This step showed a significant impact on the similarity ratio in the datasets because the range of the new datasets became smaller than the original dataset.

Modified RLE (T-RLE).
The RLE algorithm has a simple strategy that replaces sequences of repeated characters with the number of occurrences [23]. At first glance, it seems to save a lot of space, especially for characters that have more than three occurrences, which gives a positive reduction. On the other hand, a negative reduction will be given for a character that has one occurrence (adding more characters to represent the original character). Instead of having one character, applying RLE will give (character: 1, Repetition: 1) which is two characters to represent the original character. This case has been mentioned in [24].
Among the applications of RLE algorithm is to separate the runs from the variables for further compression and in case of dealing with numbers like temperature and humidity. Let consider a set of 26 characters (1 1 1 3 5 5 7 9 9 11 13 13) after applying RLE: Variables (1 3 5 7 9 11 13). Repetition (3 1 2 1 2 1 2). Total: 28 characters. RLE 1 has been applied for text data (contains letters). As we see in the example below, it is not suitable for numerical data. Most of the time the readings will be wrong like the first variable can be interpreted as eleven one time or two times one's which requires access to the repetition. We proposed a solution to solve this problem by adding a (,) after the variables that have one occurrence for the modified RLE 1: Applying RLE 1 will give the following: Variables (11 3 55 7 99 11 1313) Repetition (3 2 2 2) Total: 27 Modified RLE 1: Variables (1 3,5 7, 9 11,13) Repetition (3 2 2 2) Total: 22 A no reduction will be given for the characters that have two occurrences (the number of the represent characters and the original characters are equal); the application of RLE will give (character: 1, Repetition: 2). We are going to treat this case in our proposed T-RLE. Algorithm 1 presents the pseudocode of the T-RLE algorithm.
T-RLE is applied for the same example: Variables (1 3,5&7,9&11,13&) Repetition (3) Total: 17 Regarding the characters that have more than 2 occurrences, the T-RLE functions similarly to the RLE algorithm. In the variable output, we put a space (⌴) after the character, which indicates that this character has more than 2 occurrences. The character repetition will be saved in the repetition output. Run-length encoding can be expressed in multiple ways to accommodate data properties like weather measurement data (temperature and humidity). We proposed a T-RLE algorithm that fits with this kind of data, especially in regards to numbers that have 1 or 2 occurrences.

Bzip2.
Bzip2 compression is an alternative to GZIP that works almost the same way. Bzip2 uses the Burrows-Wheeler transform with Huffman coding. The advantage of Bzip2 is the greater compression capability compared to LZ77/LZ78 [25].

Dataset Description.
To evaluate our proposed approach, we used some real deployment datasets from the sensor scope 3 Wireless Communications and Mobile Computing project described in Table 1. Sensor scope is an environmental monitoring system of several projects; these are made publicly available for research purposes. These datasets are weather measurement, which have many parameters (wind speed, wind direction, etc.). In our case, we only process the temperature and humidity measurements. We compared the performance of our approach to the previous method and with some traditional algorithms like GZIP, Bzip2, LZMA, LZ77, LZW, and RLE-Bzip2.

Dispersion Measures.
To show the interest of applying delta encoding, we plot below the frequency distribution of the original data and after applying delta encoding in Figures 2 and 3, respectively. Besides, we calculate some dispersion measures, which are numerical measures that used to measure the degree of homogeneity (convergence) or dispersion (divergence) of samples. They are used to describe the dataset, as well as to compare different datasets [26]. However, we have calculated the mean (2) and the variance (3), and the standard deviation (4). To have a quick idea of how dispersed the data are and the convergence of the samples in the original datasets and after applying delta encoding as Table 2 shows.
The analysis of the results shows an evident magnitude of the effect of applying delta encoding through the dispersion measurement of all the datasets. The mean indicates the center of the datasets which equal to 0 in all the datasets approximately, together with the variance and the standard deviation which is smaller in all the datasets that we applied delta encoding. Note that the smaller the variance and the standard deviation, the smaller the dispersion rate which

Wireless Communications and Mobile Computing
expresses the high convergence the samples are. Besides, we have plotted the frequency distribution to observe the data distribution before and after applying delta encoding. All the plots show a high affinity for all the samples from 0. The high similarity of the data will be detected, which is the main aim of applying the delta encoding.

Packet Compression Ratio.
The compression ratio is the most crucial metric to begin with evaluating the parameters of any compression algorithms. To prevent the repeating of the same results in the compression ratio and the packet compression ratio (PCR), we only calculate the CR which defined in where CD and UCD represent the size of the compressed and the original data, respectively, and each packet can contain at most 29 bytes of payload [13]. From here, we can count the number of packets required to transmit the compressed and  Table 3 presents the results obtained of the proposed method against several datasets in terms of the compression ratio. Table 4 and Figure 4 present the results obtained and the comparison with other algorithms. The results show a significant improvement in the compression ratio compared with our previous work (RLE-Bzip2) and other compression algorithms, resulting in the reduction of compressed packets that require less bandwidth hence less transmission time.

Compression
Rate. In addition to CR, another parameter to evaluate any compression algorithm is the compression rate (CRate), as shown in (6). It is known as the number of output bytes per character which defined as [27] CRate = CD UCD : The ratio represents the number of the bytes per character in which the compressed data will be represented. Table 5 describes the compression rate of the proposed method compared to some compression algorithms. However, the proposed method gave good results under the same datasets in terms of compression rate; this latter will be better whenever it is small. Unlike the compression ratio, the lower the compression rate, the better the compression performance of the algorithm [9].

Percentage Improvement of Compression Rate (PICR).
The percentage improvement of compression rate [28] is the amelioration percentage of the proposed method compared concerning other algorithms which defined in   where CRate_P and CRate_C are the compression rate of the proposed algorithm and the compression rate of the compared algorithm, respectively. The proposed method is an optimization of our previous work (RLE-Bzip2), where the gain percentage has the utmost importance. The gain varied between different datasets, where it reaches the highest value of 14.23% under the LU_84 Temp datasets. It also achieved the lowest value of 1.18% under the LG_20 Hum dataset. Thus, it corresponds precisely to the results we obtained in terms of the compression ratio. Those results are not significant, but they are considered significant results compared to the other traditional algorithms, as Table 6 shows.

Conclusions
Energy consumption was and still the utmost important constraint in wireless sensor networks. It considers as a hot topic for the researchers to find new methods to reduce or at least rationalize energy consumption. Data compression showed efficacy in reducing the data size to be transmitted, therefore consuming less energy to transfer the same data by the communication unit. In this paper, we proposed an optimized lossless compression approach of our previous work, which based on delta encoding and a proposed T-RLE algorithm. Our approach showed a slight improvement compared to our previous work. On the other side, it performed good results comparing with the other traditional algorithms. In the future, we are going to deploy this method in a real WSNs, by considering the output data from the Analog to Digital Converter (ADC).