Reducing the Energy Budget in WSN Using Time Series Models

Energy conservation is critical in the design of wireless sensor networks since it determines its lifetime. Reducing the frequency of transmission is one way of reducing the cost, but it must not tamper with the reliability of the data received at the sink. In this paper, duty cycling and data-driven approaches have been used together to influence the prediction approach used in reducing data transmission. While duty cycling ensures nodes that are inactive for longer periods to save energy, the data-driven approach ensures features of the data that are used in predicting the data that the network needs during such inactive periods. Using the grey series model, a modified rolling GM(1,1) is proposed to improve the prediction accuracy of the model. Simulations suggest a 150% energy savings while not compromising on the reliability of the data received.


Introduction
Wireless sensor network (WSN) is the backbone of ubiquitous computing applications such as military surveillance, disaster recoveries, environmental and structural monitoring, health and security monitoring and control, wildlife monitoring and precision agriculture, and habitat monitoring. Sensor nodes which are the basic components in a WSN may sense the environment, aggregate and compute the data collected, and usually transmit the data from source nodes to a sink node called a base station [1].
With the growing need to monitor and control the environment, wireless sensor networks have become an indispensable tool enabling the required data to be acquired and transmitted to relevant endpoints. They are based on the microelectromechanical (MEM) technology, radio or wireless technology, and digital electronics to enable lowcost low-powered miniature devices to communicate untethered over near and far distances. WSNs are made up primarily of sensor nodes, otherwise known as motes which have sensing, data processing, and communicating capabilities and are usually deployed in large numbers due to their size and low cost. Wireless sensor networks may consist of a base station and/or a gateway and several deployed sensor nodes whose positions may be randomly determined or predetermined.
Characteristics of wireless sensor networks include but not limited to their miniature sizes and the low power. They, however, do have some drawbacks such as limited memory space, low computational power, and the low bandwidth [2].
The miniaturized nodes influence the number of sensor nodes mostly deployed in applications, mostly ranging between several hundreds to thousands. It also limits the number of components on the microchip, limiting their capabilities. This suggests that the nodes usually have low power capability. The power constraint in WSN is a major consideration in their deployment and use. Of the major components (radio, processor, storage, and power unit) of the typical node, the radio which supports communication is the major consumer of the energy of sensor nodes. The energy expended by the radio may account for up to 70% of the total energy expended in the sensor node [3].
The sensor networks may either be deployed as star, tree, mesh, or point-to-point as shown in Figure 1.
The mesh topology is mostly employed in wireless sensor networks to mitigate the problem of energy consumption in WSN discussed above. This is because mesh topologies have low transmit power, transmitting over shorter distances as opposed to other topologies that allow long single-hop transmissions to the sink. In WSN applications where continuous monitoring of sensed phenomenon is extremely important (e.g., health monitoring systems, structural health monitoring systems, road traffic monitoring systems, and water quality monitoring systems [4]), continuous communication of the sensed data is essential to maintain the reliability of the data received at the base station and importantly to be able to detect any changes in the sensed environment. Such continuous transmissions add communication costs of the network that may deplete the energy of the batteries powering the nodes. The resultant effect is network failure due to several nodes dying off before the end of the network's operational lifetime [5]. To ensure the reliability of data received at the base station, routing, duty cycling and retransmissions, and redundancies are some approaches used. Lessening the number of retransmissions will reduce the amount of energy wasted since dropped data packets are not transmitted again while introducing redundant nodes increases the overall energy cost of transmission in the network.
The implementation of WSN in environmental monitoring applications usually requires uninterrupted operations. This is to ensure that no incident is missed, and the data received can be relied upon. To guarantee the reliable acquisition and transmission of data, the power supplied to the node must be maintained or replenished [6]. However, continuous monitoring and transmission lead to power depletion. In recent times, there has been an increased interest in the use of predictive algorithms to augment the acquisition and transmission of actual data.
In this paper, we analyze two time series models, the autoregressive integrated moving average(ARIMA) and grey model (GM (1,1)) as approaches to model and forecast time series data in WSN. The prediction modeling used is aimed at determining the optimal period for scheduling the on and off periods of nodes in a network. Nodes will be scheduled to be turned off for the duration of an optimum period determined by the prediction algorithm and turned on when new data must be collected for processing. During the period, when the node is turned off, the data for that period is predicted. Our simulations indicate that we were able to determine the optimum number of data sets required and the longest sleep period required for the prediction that minimizes energy consumption without compromising on the reliability and accuracy of data received.

Literature Review
2.1. Conserving Energy in the Network. Proposed energy conservation techniques in the literature are broadly categorized as duty cycling, data-driven, and mobility [7,8] as shown in Figure 2. These approaches have been studied extensively in the literature either individually or in combination to mitigate energy consumption in the wireless sensor networks. Duty cycling approaches put the radio in low power or sleep mode for longer periods when there is no data to be transmitted. The nodes, therefore, alternate between sleep and active modes due to the activities of the sensor node. A duty cycle may be defined as the ratio of the active time to the sleeping time [9,10]. Since duty cycling is generally not concerned    Wireless Communications and Mobile Computing about the characteristics of data in transmission [7], datadriven approaches use the features of the data in transition to mitigate energy consumption. Duty cycling approaches determine optimum schedules to turn off or turn on nodes for transmission with the aim of reducing energy consumption in the network. The aim is to reduce idle listening when the node's radio waits in vain for frames and overhearing when nodes stay active listening to uninterested frames. Duty cycling approaches usually include scheduling algorithms that determine periods when nodes transition between sleep and wake up modes. The active periods of the individual sensor nodes usually have the same length called the slot time. The transceiver of sensor nodes may transmit packets at any time, but can only receive packets when in the active state. Duty cycling reduces idle listening, overhearing, and overlistening, which are major causes of energy loss in WSN [10]. The scheduling algorithms implemented in duty cycling eliminate idle listening since it gives nodes their shared access to the wireless medium [11]. Scheduling algorithms reduce idle listening and also ensure the efficient transmission of data between neighboring nodes that may be in sleep mode and nodes in active mode. While the duty cycle is fixed and determined before network deployment, scheduling algorithms may be used in addition to adjusting the sleep-wake cycle to the network traffic.
Data-driven approaches are aimed at reducing the sampling and transmission of redundant data even if sensing, as in some instances, constitutes only an insignificant portion of the energy consumption [12,13]. It also includes the reduction of the amount of data transmitted without negatively affecting the quality and reliability of the resultant data [14,15]. In applications such as real-time systems where continuous monitoring is essential, data-driven approaches ensure that energy consumed by sensing is maintained within allowable thresholds [16,17].
Mobility may be an important consideration in the energy conservation scheme of sensor networks [18]. Mobility ensures that the physical location of nodes or the sink is adjusted to primarily reduce the distance between nodes to minimize energy consumption. It therefore mitigates the problems of static nodes during multihop communication, which usually have the energy hole problem where nodes closer to the sink are depleted of their energy [19].
Conservation of energy is usually achieved through the use of two or more energy conservation approaches. In this paper, data-driven approaches are used in conjunction with data reduction to reduce the frequency of transmission.
Data prediction as a data reduction method in WSNs is concerned with the building of a model for forecasting sensed parameters of sensor nodes using historical data [17]. The forecasted values are adopted when they fall within some predefined acceptable thresholds. Applications that use prediction in WSNs have the prediction model residing on the sink, the sensor node, or mostly on both. In most prediction approaches, the trend of data received at a node is modeled, such that only data deviating from the trend is communicated to the sink. This approach requires periodic synchronization of the model on the nodes with that on the sinks when received data deviates from the forecasted values. Data prediction approaches in the literature have been categorized into three main areas, which are stochastic, time series, and algorithmic approaches [7] as shown in Figure 3. Recent research has introduced machine learning algorithms into WSN with some successful applications [20][21][22]. Time series models have been used to predict future data values in WSN mostly to reduce the frequency of data transmission. Models such as autoregressive integrated moving average (ARIMA), autoregressive moving average (ARMA), least mean squares (LMS), and grey series have also been used in WSN.
The ARIMA models both stationary and stochastic data by "integrating" the simpler autoregressive (AR) and moving average (MA) models [23]. The ARIMA is based on the Box-Jenkins methodology [24]. The grey series models [25] stochastic data whose values vary with time and its advantageous in forecasting time series data since the modeling is based on the generated series in the modeling process and not the raw data. It also supports the near accurate modeling of data as small as 4 data sets. Time series models are increasingly used in WSN, with the most common ones being ARIMA [23,26], least mean squares [27,28], and the grey series [29,30]. In this paper, the ARIMA and grey series are compared for best performance.

Data Prediction
Approaches. In WSN, time series models are increasingly being used to predict and reduce data transmission. Autoregressive (AR), moving average (MA), autoregressive moving average (ARMA), autoregressive integrated  3 Wireless Communications and Mobile Computing moving average, and least mean square algorithm are some linear algorithms used to WSN. Their primary advantages over other predictive techniques are that they exhibit the least complexities; they are simple to implement and produce acceptable accuracies [31]. The accuracy of a time series model depends on the nature of the data sampled. For example, ARMA models are suitable for slowly changing data like the temperature of a water body and its water level. Data that may exhibit sudden or sharp changes such as potential of hydrogen (pH) which may be a result of unforeseen human factors are best modeled with MA and give a better ratio of their complexity and performance. Grey series are time series models that are known for their superiority over other time series models due to their high accuracy and simplicity of use. The GM(1,1), also known as "grey series one differential one variable", is the most common of the grey series models used [32,33]. It is usually used to predict both linear and nonlinear data series. In this work, two popularly used time series models employed in WSN are discussed, and simulation analysis is performed to select the optimum energyefficient method without compromising on reliability.

2.2.1.
ARIMA. An ARIMA model is of order ARIMAðp, q, rÞ, where p is the number of autoregressive terms, q is the nonseasonal difference needed for stationary data, and r is the number of lagged forecast errors in the prediction. The prediction is such that ∀ values of X time series data are described above. ARIMA may be applied to stationary data, which means the data series may have no trend, with little variations of the mean that has a constant amplitude and has its short-term random patterns looking the same over time. The reader may refer to [26,34] for detailed explanation on using ARIMA for WSN data prediction.

Grey
Series. The grey series was initially presented by Deng in 1982 [35,36] and has been applied several in finance, physical control, engineering, and economics. It is an example of a time series prediction model that shows superiority to other statistical methods of prediction [25]. Artificial intelligent models in the literature include fuzzy systems [37], hidden Markov models [38], support vector machines [39], and neural networks [22,40]. The intelligent approaches such as the neural networks give better accuracies in prediction than their statistical counterparts but are usually very complex to implement. Neural networks (NN) are also criticized as requiring a higher number of training data set that need longer periods to train before generalizations are acceptable. The grey series approach, also known as the GMðn, mÞ, is grey model, where n is the order of the differen-tial equation and m is the number of variables. It is widely used in the prediction of time-dependent data because of its ability to predict data that has uncertain characteristics while avoiding conventional statistical properties. Its advantages include but not limited to its ability to discover inherent regularity of data from disorganized data to predict its performance variations at some point in the future [36].
Steps to the grey series include the following: (i) Original data: be a time series data obtained from the environment and n is the period when data was collected with the difference between subsequent values being some time t.
(iii) The original form of the GM(1,1) is found in the following equation: Using the adjacent neighbor means, the generated sequence is obtained from X (0) to obtain the sequence in the following equation.
And the basic form of the GM(1,1) is shown in the following equation.
For a series of data, XðkÞ = xð1Þ, xð2Þ, xð3Þ, ⋯, xðnÞ, the value of ZðkÞ = 1/2ðxðkÞ + xðk − 1ÞÞ: The sequence of data obtained by the adjacent neighbor means constructs the new series for prediction based on the available time series data.
(iv) Using the least squared means to obtain the parameters of the GM(1,1), x ð0Þ ðkÞ + a z ð1Þ ðkÞ = b, putting in the values for each k 4 Wireless Communications and Mobile Computing Convert the above equation into a matrix of the form , where where n is the number of data values. The parameters a and b are known as the development coefficient and grey action quantity of the GM(1,1). GM(1,1) being a univariate sequence depends on the backgrounds values a and b. The value of b reflects the changes in the data sequence and its intention is grey as opposed to other black box or white box methods. The model is invalid when |a | ≥2, such that for values of (-∞,-2) ᴗ (∞,2) the growth rate weakens and the simulation error of the prediction grows drastically.
The predicted values are obtained from Equation (13), which is the expanded form of the following equation.
Reducing the inverse AGO, the discrete response of the GM(1,1) is which gives the final prediction value obtained from the even difference grey model.

Results and Discussion
Given a continuously flowing river, the quality parameters at a position ðx 1 , y 1 Þ at time t n may differ significantly from parameters acquired at the same position at a different time, t n+1 : The quality parameters are therefore temporal, attributable to the flow of the river and changes in the river ecosystem. These changes may result from introduction pollutants into the water or through other human activities. The temperature of the water is also subject to changes, varying with the time of day and the incidence of sunshine on the river. The accuracy of several predictive models is largely dependent on the size of the data. Large data sets generally give better predictions than fewer data sets. Original readings of water quality parameters from a river are collected and presented as shown in Figure 4. 48-hour readings of conductivity, temperature, and pH taken from the 12:00 am on November 1, 2018, to 12:00 am November 3, 2018 are presented. The total number of data points obtained in the 48 hours was 88 for each of the parameters, pH, conductivity, Conductivity values for GM(1,1) and ARIMA were predicted with varying data points as a training set and the remaining as the test set for the model. The models obtained were ARIMA(0,1,0), ARIMA(0,1,1), and ARIMA(0,1,1), respectively, for 10, 15, and 20 readings. ARIMA(0,1,1) without a constant gives a simple exponential smoothing. This corrects the challenges of the random walk model where the nonstationary series with noise and fluctuations are smoothened out. Hence, it takes averages of the last few observations to forecast rather the most recent observation. Predictions give a constant value with no variations.
Using the ARIMA and even grey system models, the following predictions were obtained as shown in Figure 5.
Temperature values for GM(1,1) and ARIMA predictions were taken for 15, 20, 25, and 35 data points as the training and the rest as the test set. The ARIMA models were ARIMA(0,1,0), ARIMA(1,2,0), and ARIMA(1,1,0), respectively, for 15, 25, and 35. The predictions and original values are presented in Figures 5-7. It is observed that GM(1,1) pro-duces data predictions of monotonic increase or decrease with a constant factor and produces varying predicted values that lie within the maximum and minimum original data readings. From Figure 5, the maximum pH value is 12.4 and the minimum pH value is 9.32. Similarly, in Figure 6, the maximum and minimum values read from the conductivity sensor are 195 and 125, respectively. Finally, in Figure 7, the maximum and minimum values recorded from the temperature sensor are 35.7 and 30.6, respectively. The predictions of ARIMA presented nonseasonal differencing with a constant term from the last value of original data reading, except for predictions of temperature where predictions were of the first-order autoregression. This is because the training data is insufficient to model the variations seen in the original data set. To test the performance of the prediction models used, the mean absolute percentage error (MAPE), mean absolute deviation (MAD), and the root mean square error (RMSE) performance metrics were used. The MAPE is percentage difference between the original values and the predicted, while the MAD is the distribution of the predicted values and it shows how close or spread out the values are from each other. The RMSE tells how far   GM(1,1) is generally a better model for predicting values of small data set, i.e., below 25 data sets and in the case of GM(1,1) as small as 4 data sets.
3.1. Improving on GM (1,1). Given that GM(1,1) showed better predictions with better performances than ARIMA for small data sets that have data with stochastic patterns, we use heuristic approaches to improve on the GM (1,1).

Wireless Communications and Mobile Computing
The reason is to find the optimum least training set that gives the best prediction over short intermittent periods. Two approaches are used in this paper, the heuristic and the modified rolling GM(1,1).
We use predictions for 20 data readings to estimate the optimum length of effective prediction.
The heuristic algorithm used is If predicted values < average max, min ð Þ ð Þ , predicted values, Graphs of original and adjusted values of pH, conductivity, and temperature are presented in Figure 8.
To obtain an optimum smallest data required for a good prediction and the optimum length of prediction that reveals the sudden changes that may occur in moving river, using data sets from pH, we forecast data of varying lengths of statistically minimum data sets (i.e., data sets below 30) and make forecasts to determine the optimum length of good forecasts.

The
Rolling GM Approach. The grey prediction performs quantitative forecasts of data readings using unascertained characteristics of the data. This is done by using sequence operators of the original readings to generate and extract hidden patterns to establish a model for future predictions. When the data is chaotic and does not provide a good fit for most models, modifications are made to the original GM(1,1). One such heuristic approach is the rolling GM(1,1) [30]. The rolling grey series makes a forecast of time series data values using a constant window size of past data. A constant window size means it uses a fixed number of values for prediction. The rolling GM(1,1) is defined by the original GM(1,1) basic equation in the following equation.
A constant window size is predefined by the user, such that for each round of prediction, for example, for original data sets The model predicts This means that for each round of prediction, k is increased by 1 x ð0Þ ðk + 1Þ, x ð0Þ ðk + 2Þ, x ð0Þ ðk + 3Þ, x ð0Þ ðk + 4Þ: And the preceding data predict the next round of values as follows: In [30], the equation of rolling GM(1,1) is defined in the following equation.
where n is the window size and r is the round whose value is to be predicted. Using a heuristic approach, we modify the constant window such that assuming a constant window size of n forecasting values in the i iteration for k = 0, 1, 2, ⋯, n, and k is increased by 1 when i % n = 0, and i > n.

Wireless Communications and Mobile Computing
In our heuristic approach, inverse AGO does not predict the original values used as the training set. Equation (21) becomes the basic equation for the proposed approach for prediction. The prediction after every round of kn selects the immediate data values from kn + n as the training set for the next iteration. Predicted values do not include the period of the original data. This reduces the computational cost of forecasting the prediction period and original data as seen in the original grey series.
Using the minimum allowable data set for predicting in the GM(1,1) (i.e., 4 being the minimum), varying readings of between 4 and 10 training data readings are made for predictions of between 2 and 10 as presented in Figures 9-12. Figure 9 shows a modified GM(1,1) approach using a rolling GM approach. The predicted values make substitutes of the original values intermittently to improve on the prediction accuracy of the model.
In Figure 10, different lengths of predictions are made for GM(1,1) using 6 original values alternatingly.
In Figure 11, different lengths of predictions are made for GM(1,1) using 8 original values alternatingly.
In Figure 12, different lengths of predictions are made for GM(1,1) using 10 original values alternatingly.
In Figure 13 and Table 1, we compare the relative errors of the values 4, 6, 8, and 10 when we forecast values of 2, 4, 6, 8, and 10. The best relative error values were found at 4 data points predicting 2 values, 6 data points predicting 2 and 4 values, and 8 data points predicting 2 and 6 values. Comparing with the corresponding energy savings per data point per predicting length, 4 data points predicting for 6 values give the best relative error vis-à-vis the energy savings. The energy savings were calculated based on the number of nodes transmitted versus the number of predictions made, such that no transmission is required during the period of prediction.  Figure 8: Graphs of pH, conductivity, and temperature using the heuristic approach to improving on GM(1,1).

Wireless Communications and Mobile Computing
For example, if 4 values are transmitted and 4 values are predicted, then the node saves 100% of its energy during the prediction period. The best relative errors were obtained when 4 values were transmitted with 6 values predicted, which is as follows: This suggests that 150% of the energy of the nodes is saved for each original and predicted value cycle. This energy saving does not depend on other network conditions that affect energy loss but is only dependent on the ratio of energy losses due to transmission and reception.

Conclusions
Requiring large data sets for forecasting prediction models may not be ideal for WSN due to the cost of data transmis-sion. GM(1,1) makes a good prediction with statistically small data sets, such as data set below 30. The minimum required for GM(1,1) is 4 which reduces the energy spent on transmission. A sensitivity analysis performed with GM(1,1) of varying lengths of data beginning at 4 to 10 training data set showed better performance of GM(1,1). Two approaches were used to improve the performance of GM(1,1) to provide near accurate predictions for shorter periods in a continuous stream of data. The two approaches used were the heuristic and the rolling GM. The heuristic approach is to reduce the length of prediction and improve the performance by modifying the inverse AGO approach of GM(1,1) to exclude the training data sets in prediction. The heuristic approach then performs a rolling GM(1,1) to continuously predict real-time stochastic data sets while maintaining the required reliability. Comparing the percentage energy savings vis-a-vis, the relative errors of the predictions and their lengths show the minimum of 4 data points is ideal if predictions are rolled for a constant window of 6 prediction lengths. This suggests a 150% savings in energy while ensuring the optimum accuracy of data predicted and the reliability of continuous data received at the receiving node. For future works, a network simulator or real-testbed experiment with real-time values may be modeled with these approaches to verify the proposed approach.

Data Availability
All necessary data is available in this transcript. The other data are available in the thesis submitted to the University of Ghana.