Compression Algorithm of Road Traffic Spatial Data Based on LZW Encoding

Wide-ranging applications of road traffic detection technology in road traffic state data acquisition have introduced new challenges for transportation and storage of road traffic big data. In this paper, a compression method for road traffic spatial data based on LZW encoding is proposed. First, the spatial correlation of road segments was analyzed by principal component analysis.Then, the road traffic spatial data compression based on LZW encoding is presented.The parameters determination is also discussed. Finally, six typical road segments in Beijing are adopted for case studies. The final results are listed and prove that the road traffic spatial data compression method based on LZW encoding is feasible, and the reconstructed data can achieve high accuracy.


Introduction
The advent of big data brings unprecedented opportunities as well as challenges, especially in the field of transportation and traffic engineering [1,2].With the rapid development of science and technology, the intelligent transportation system (ITS) has developed continuously, and its applications have become wide ranging.The ITS system can accomplish the tasks of road traffic data acquisition, processing, and transportation.Besides, it can complete the job of traffic state analysis, route guidance, and traffic control.As various road traffic detection systems are adopted in the road traffic field, the collected road traffic state data increase and become massive.This serious situation introduces a challenge for realtime transmission, storage, and guidance of massive road traffic data.Thus, it is necessary to find an efficient approach to compress real-time traffic states data which can save much storage space as well as providing some other applications [3].And the compression method of road traffic states data has deeply promoted the managements for transportation administrators.Besides, useful compression method of road traffic data can also be applied to transportation research fields, and some inspirations may occur to the researchers.
The essence of road traffic state data compression is to represent the signal information with less data.Through effective compression and reconstruction, traffic data transmission and storage can be achieved [4][5][6].
In recent years, a great many of data compression methods have been explored in traffic and transportation fields.With the popularity of machine learning and data mining study among practitioners and researchers, some road traffic compression methods are presented.Due to the multidimensional and multigranularity characteristics of traffic and transportation big data, PCA method realizes the compression of road traffic states data through reducing the dimensions of original data [7].As an emerging technology, compression sensing has also been used in data compression due to its superiority.Compression sensing breaks through traditional Nyquist sampling theorem restricts and can collect and compress data simultaneously.Making use of the redundancy characteristics of road traffic states, compression sensing technology achieves the estimation [8] and compression [9][10][11] of road traffic states data.Since the road traffic states data possess the spatial-temporal correlation and similar characteristics, Xiao et al. presented a spatialtemporal model based on road traffic data compression and 2 Journal of Advanced Transportation decompression technology of 2D discrete wavelet transformation, realizing the denoising compression of ITS system [12].Ou et al. proposed road traffic volume data compression based on artificial neural network [13].
Some modified and improved methods also fill the compression gap.The embedded devices in motor vehicles also generate abundant data for researches to investigate the compression of road traffic states data [14].Making use of the GPS positioning data produced by the mobile devices of travelers, Ma et al. presented a differential preprocessing method, and a dynamical Huffman algorithm was adopted to compress GPS positioning data [15].Wang et al. put forward an encoding algorithm with self-adaptive switching mode according to specific format [16].Hou presented a stop-wave mode based on the concept of the compression factor and its differential equation [17].Song et al. proposed a hybrid spatial compression algorithm and error bounded temporal compression algorithm to compress the spatial and temporal information of trajectories, respectively [18].However, many researches do not have a common baseline for their performance analysis and provide the infrastructure to operate on a publicly available dataset.
The existing road traffic data compression methods mainly focus on the compression of road traffic network data.However, in recent years, limited literature has been written on the road traffic spatial data compression methods of different road segments on similar time nodes.Some literatures on predictions are investigated temporally and spatially in recent years.The studies are not only in road traffic field, but also in the field of transportation.
The travel needs and travel routes of traffic participants exhibit certain regularity; thus, the road traffic spatial states of different road segments on similar time nodes represent strong relationships.That is, the changing curve of road traffic spatial state on different road segments on similar time nodes possesses some similarity.The correlation presents great probability for the compression of road traffic spatial data.Thus, based on the spatial correlation characteristics of the road traffic states, the road traffic spatial data on different road segments on similar time nodes are extracted for compression.LZW inherits the merits of LZ77 and LZ78 on compression efficiency and speed.Besides, the method easily achieves good performance.Thus, the LZW encoding is introduced in the study.Based on the spatial correlation of road traffic, a compression method of road traffic spatial data based on LZW encoding is proposed in this paper.
In this study, a compression method of road traffic spatial data based on LZW encoding is proposed to compress the road traffic spatial states data under the same time intervals, realizing efficient transmission and storage as well as display.The useful compression of road traffic states data can be efficiently used into feature extraction and traffic states prediction.Multivariate time series analysis is similar to the proposed method, which can take into consideration both spatial and temporal correlations.In our study, we used the spatial correlation characteristics of road traffic states to compress the states data.The aims of the two studies are different.Some motivations are explained here.Although the proposed compression method of our study is tested on the road traffic states data, it is also very useful for transportation management as well as transportation prediction.Besides, the compression can be also used for feature extraction, which can be applied to evaluate the traffic running states.
Based on the characteristics of road traffic flow, the PCA method can be used to analyze the correlation of spatial road segments [19,20].Then, the spatial road segments are selected to extract the data for compression.The spatial road segments denote the different road segments; the data on these segments are extracted on the spatial road segments at the same time intervals.
The contributions of the proposed algorithm are threefold: (1) The PCA method was introduced to the algorithm to select the road segments with spatial correlation.
(2) A novel road traffic spatial data compression algorithm based on LZW encoding was proposed to construct the difference data on selected spatial road segments under the same mode.
(3) The proposed algorithm could determine the optimal parameters in the training process based on spatial historical data and base data on road traffic states.
The rest of this paper is organized as follows.The modeling methodology of the proposed algorithm is discussed in Section 2. In Section 3, parameter determination of the road traffic spatial data compression study based on LZW encoding is presented.The experiment results are shown in Section 4. The conclusion and direction for future studies are discussed in Section 5.

Compression Algorithm of Road Traffic
Spatial Data Based on LZW Encoding 2.1.Framework of the Algorithm.The process of compression and reconstruction of road traffic spatial data is shown in Figures 1 and 2, respectively.First, the PCA method was used to select the road segments with the characteristics of spatial correlation.The road traffic spatial data under the same mode on different road segments were acquired to construct the reference sequences of road traffic characteristics.Based on the analysis of spatial correlation, the base road segment was selected and the data on which were regarded as spatial base data.Second, the historical data on other spatial road segments under the same mode was extracted as training data.The optimal threshold of road traffic spatial difference data was determined based on road traffic spatial base data under the same mode.Third, real-time spatial data on other road segments under the same mode were acquired as experimental data and the road traffic spatial difference data were acquired on the basis of road traffic spatial base data under the same mode.Finally, the compression and reconstruction of road traffic spatial difference data were achieved through LZW encoding and decoding technology, respectively.

Road traffic spatial road segments acquisition
The base road segment and base data acquisition The spatial road segment and the spatial data acquisition The difference data acquisition and compression based on LZW encoding

Reference sequence ID Reference sequence name Description
The difference data decompression base data based on LZW decoding The difference data of base data and real-time data acquisition The road traffic spatial realtime data reconstruction

Selection of Road Segments with Correlation
Based on PCA Method.Road traffic flows possess the characteristics of periodicity, similarity, correlation, and so on.The road traffic flows of spatial road segments indicate a strong spatial correlation.Thus, the PCA method was used in this study to select the road segments with the characteristics of correlation.
PCA is a multivariate statistical method that eliminates the correlation among the variable indicators.-dimensions of road traffic state data can be effectively reduced to two dimensions, which can be illustrated in a 2D figure.Taking advantage of these characteristics, the related road segments can be selected.The process has been described in previous studies [19,20].

Division of Road Traffic Running Modes.
The road traffic running modes can be divided into two levels: the road network level and road segments level.Assuming that the running modes division identification of road network level and road segments level can be divided into  and ℎ submodes, respectively, the road traffic running modes can be divided into ×ℎ modes in total.The modes can be shown as   = { 11 ,  12 , . . .,  ℎ }.  and ℎ can be determined by the road traffic running modes division identification.The running modes division identification of road network mainly refers to the impact factors of road traffic running modes on different dates.The road traffic running modes division identification of road segments refers to the influence factors of the road traffic running modes of the specific condition of the road segments, which can be illustrated as in Figure 3.

Construction Design of Road Traffic Characteristics
Reference Sequences.Assuming the collection period of road traffic state data was Δ, then time format of road traffic information template can be illustrated as in Figure 4.The table format of the road traffic characteristics reference sequence can be described as in Tables 1 and 2.
Let  + 1 denote the total number of selected road segments, which can be described as follows: where  + 1 is the number of spatial road segments;   (1 ≤  ≤  + 1) denotes the th road segments;  represents the set of selected road segments with correlation.
Based on the correlation of road traffic spatial data, the base road segment is acquired to extract the road traffic data as road traffic base data.The road traffic data on other  spatial road segments are extracted as historical data and realtime data.

Optimal Threshold Determination of Road
g Submode Figure 3: The division chart of road traffic running mode.
Based on the formulas of (2), the optimal threshold of difference data can be identified.

Acquisition of Road Traffic Spatial Difference Data.
The spatial data on other road segments were extracted as realtime data.Under  ℎ mode and based on the spatial base data, the road traffic difference data were acquired.The main expressions can be described as follows: The characteristics are described in Table 4.The created string table does not need to be stored along with the data.In the decompression process, the same string word can still be reconstructed.Thus, the compression radio can be improved by another step.Based on LZW encoding, the road traffic spatial data compression can be achieved.The best threshold of the difference data between  road segment and base road segment can be introduced into the difference data on the  road segment and the base road segment under the same  ℎ mode.Combining the LZW encoding, the difference data compression of  road segment and base road segment can be realized.The main expressions can be described as follows: (

Road Traffic
The characteristics are explained in Table 5.
The compression radio is /  .

Parameter Determination
In the process of road traffic spatial data compression based on LZW encoding, the following parameters were involved: ( * Δ),   ( * Δ),   ( * Δ), ,   ( * Δ), , where   ( * Δ) can be acquired by ( * Δ) and , , and   ( * Δ) can be acquired by ( * Δ),   ( * Δ), and   ( * Δ).Parameter settings here are only concerned with the effect analysis of the road traffic spatial data compression based on LZW encoding.Separately analyzing the effect of each parameter on the accuracy of the algorithm cannot guarantee an optimal algorithm because these parameters influence the accuracy of the algorithm in different ways.All of the parameters in the road traffic spatial data compression results should be considered when conducting the algorithm analysis.
The compression ratios are introduced to measure the effect of parameters on the precision of the algorithm.The main expression can be described as follows: where Finally, the value of (( * Δ,  ℎ ),   ( * Δ,  ℎ ), ) can be determined through statistical analysis of the reconstructed results of road traffic state.

Road Segment Acquisition.
The proposed compression algorithm is conducted with the road traffic spatial relevant data; thus, the selected data must exhibit the characteristics of spatial correlation.The road segments will be briefly explained here.The types of the road segments are express ways, the wide of which is similar.First, the volume data on six typical road segments in Beijing were adopted in the present study.The specific road segments were determined in Table 6.
Five days (June 11, 18, 19, 25, and 26 in 2011) of road traffic data were extracted to construct the reference sequences of road traffic characteristics.The road traffic state data collection interval is 2 min.As the correlation of road segments mentioned in the literatures [19,20], the first two principal components can reflect most of the information of road traffic state.Based on PCA method, we can find that four road segments, HI3009b, HI3008b, HI7058b, HI7036b, exhibited strong correlation that can be determined by cross correlation.
The volume data on the six road segments from June 11, 2011, were extracted to determine the spatial correlation.The cross correlation is shown in Table 7.According to the table, the correlation of all road segments can be determined.
As shown in Table 7, the cross correlations between HI3008b and the other three road segments (HI3009b, HI7058, and HI7036b) were greater than 0.9.Thus, the HI3008b road segment served as the base road, and its collected data were considered as the base data.The volume data on the four road segments were selected for the case study to prove the performance of the proposed algorithm.This can be explained by the following reasons.
The change regularity of volume is mainly determined by the regularity of people's origin-destination (OD) travel.But for different date, people travel OD changes randomly.The travel on weekends has a comparative regularity.Thus, four days (June 18, 19, 25, and 26 on 2011) of road traffic data on spatial road segments were extracted to construct the reference sequences of road traffic characteristics.

Data Instruction.
The collected road traffic data on the HI3009b, HI7058b, and HI7036b road segments from June 11, 2011, were considered as training data to conduct algorithm parameter settings.Under the same mode, the collected road traffic data on the HI3009b, HI7058b, and HI7036b road segments from four other days were regarded as real-time data to validate the proposed algorithm.The running time is provided here, which can indirectly reflect the calculation speed of the proposed method.Through several times testing, the average running time is approximate to 0.45.From the running time, we can see that the proposed method is simple and practicable.
The statistical reconstructed results of spatial volume data based on LZW encoding on HI3009b and HI7058b road segments from June 18, 19, 25, and 26, 2011, are illustrated in Tables 8 and 9, respectively.CR, AE, marerr, and  denote the compression ratio, mean absolute error, absolute relative where   ( * Δ,  ℎ ) denotes the error data between the original real-time data and the reconstructed real-time data on  road segments at ( * Δ) moment under  ℎ mode;   ( * Δ,  ℎ ) denotes the mean error at ( * Δ) moment under  ℎ mode.

Sensitive Analysis.
A sensitivity analysis is the study of how the uncertainty in the output of a mathematical model or system (numerical or otherwise) can be apportioned to different sources of uncertainty in its inputs [21].In Section 4.2, four road segments are selected, and HI3008b is used for training and the others are used for testing.To test the effect of data size on the compression and reconstruction results, a sensitive analysis is urgently needed.Since the proposed algorithm is applicative for big data in road traffic transportation data, a sensitive analysis is also required to test the feasibility for little and medium-size data.The data size can be indicated by the collecting time.Thereby, a sensitive analysis is conducted through testing the compression and reconstruction results indicators under different collecting time, that is, CR, AE, marerr, and .The process of parameters determination is performed in Section 3, but the optimal parameter is determined under fixed collecting time.For different collecting time, the optimal parameters will be different.Thus, collecting time is considered as a variable to test compression results.Besides, in this process, we also follow the rule in (9).
Here, a brief data declaration is provided.In Section 4.1, one-day collected data (720) are used for experiment.To test the feasibility of the proposed method, we calculate the experimental index under different collecting time on HI7058b.This may be regarded as a test bed.The sensitive analysis can be seen in Tables 11-14.
From the sensitive analysis results shown in Tables 11-14, we can see that the compression ratio of big-size data is relatively greater than little and medium-size data.And AE, marerr, and  are all less than 10.The results show that the proposed algorithm is feasible.
A comparison is also provided here.PCA method is a famous data compression method; thus, we compare the proposed method with PCA method.We compare the reconstruction indicators on on June 19, 2011.The specific results are shown in Table 15.
From Table 15, we can see that the CR of LZW encoding is dramatically greater than that of PCA.The AE, marerr, and  of PCA and LZW are very similar.The comparison proves that the performance of the proposed method is comparatively better.

Analysis of Experiment Results.
Based on the experiment results conducted in Section 4.2, the following analyses are presented: (1) From Tables 8-10, the following results can be obtained: For the reconstructed volume data, the average compression ratios are 9.91, 15.05, and 5.94 for the HI3009b, HI7058b, and HI7036b road segments, respectively; the average mean absolute error rates are 12.15, 6.96, and 10.32 for the HI3009b, HI7058b, and HI7036b road segments, respectively; the average absolute relative error percentages are 13.79, 7.53, and 12.00 for the HI3009b, HI7058b, and HI7036b road segments, respectively; the average error standard deviations are 14.12, 9.16, and 13.37 for the HI3009b, HI7058b, and HI7036b road segments, respectively.As the statistical data show, we can find that the performance of the HI7058 road segment is better than that of the HI3009b and HI7036b road segments.

Figure 1 :
Figure 1: The process of road traffic spatial data compression based on LZW encoding.

Figure 2 :
Figure 2: The process of road traffic spatial data reconstruction based on LZW decoding.

Figure 4 :
Figure 4: The time format of road traffic characteristic reference sequence.

4. 2 .
Results.The road traffic spatial volume data compression results based on LZW encoding on the HI3009b, HI7058b, and HI7036b road segments are illustrated in Figures5-16 .

Table 1 :
Road traffic characteristics reference sequence information chart.

Table 2 :
Road traffic characteristics reference sequence description chart.
Traffic Difference Data.The data on other spatial road segments are extracted as training data.Under  ℎ mode, the road traffic spatial difference data under the same mode are acquired based on road traffic spatial base data to conduct the threshold processing.Through LZW encoding, the optimal

Table 3 Δ𝑡
The collection period of road traffic state data ( * Δ)The th period collection of road traffic state data, 0 ≤  ≤  The number of daily collected road traffic data The th road segment under  ℎ mode   ( * Δ,  ℎ )The road traffic data on  road segment at ( * Δ) moment under  ℎ mode  ( * Δ,  ℎ ) The road traffic data onbase road segment at ( * Δ) moment under  ℎ mode   ( * Δ,  ℎ ) The road traffic difference data between the training data on  road segment and the base data on base road segment at ( * Δ) moment under  ℎ mode   (,  ℎ ) The road traffic difference data between the training data on  road segment and the base data on base road segment at Δ to ( * Δ) time intervals under  ℎ mode ℎ  (,  ℎ ) The road traffic difference data between the training data on  road segment and the base data on base road segment after threshold processing Δ to ( * Δ) time intervals under  ℎ mode   (,  ℎ ) The threshold of road traffic difference data at Δ to ( * Δ) time intervals under  ℎ mode   (,  ℎ ) The data on the difference data between  road segment and base road segment after LZW encoding at Δ to ( * Δ) time intervals under  ℎ mode    (,  ℎ ) The th data on the difference data between  road segment and base road segment after LZW encoding at Δ to ( * Δ) time intervals under  ℎ mode  The number of difference data between  road segment and base road segment before LZW encoding at Δ to ( * Δ) time intervals under  ℎ mode  The number of difference data between  road segment and base road segment after LZW encoding at Δ to ( * Δ) time intervals under  ℎ mode

Table 5 𝐸
( ℎ ) The optimal training threshold ℎ  ( * Δ,  ℎ ) The difference data between the real-time data on  road segment and the base data on base road segment after threshold processing at ( * Δ) moment under  ℎ mode  The number of difference data between  road segment and base road segment before LZW compression at Δ to ( * Δ) time intervals under  ℎ mode ℎ  ( * Δ,  ℎ ) The road traffic difference data set of  road segments at ( * Δ) moment under  ℎ mode    ( * Δ,  ℎ ) The set of difference data on  road segment after LZW encoding at ( * Δ) moment under  ℎ mode   The number after LZW encoding at ( * Δ) moment    ( * Δ,  ℎ ) The   th data on difference data after LZW encoding at ( * Δ) time moment under  ℎ mode where   denotes the LZW decoding;   ( * Δ,  ℎ ) denotes the spatial difference data on  road segments after LZW decoding at ( * Δ) moment under  ℎ mode; and   ( * Δ,  ℎ ) denotes the reconstructed road traffic real-time data on  road segments at ( * Δ) moment under  ℎ mode.

Table 6 :
The road segments information.

Table 7 :
The cross correlation of the road segment.

Table 8 :
Reconstructed results on HI3009b road segment.

Table 10 :
Reconstructed results on HI7036b road segment.

Table 13 :
The sensitive analysis results on June 25 on HI7058b.

Table 14 :
The sensitive analysis results on June 26 on HI7058b.

Table 15 :
The indicators results of PCA and LZW encoding on June 19, 2011.