Raspberry Pi Based Intelligent Wireless Sensor Node for Localized Torrential Rain Monitoring

Wireless sensor networks are proved to be effective in long-time localized torrential rain monitoring. However, the existing widely used architecture of wireless sensor networks for rainmonitoring relies on network transportation and back-end calculation, which causes delay in response to heavy rain in localized areas. Our work improves the architecture by applying logistic regression and support vector machine classification to an intelligent wireless sensor node which is created by Raspberry Pi. The sensor nodes in front-end not only obtain data from sensors, but also can analyze the probabilities of upcoming heavy rain independently and give early warnings to local clients in time. When the sensor nodes send the probability to back-end server, the burdens of network transport are released. We demonstrate by simulation results that our sensor system architecture has potentiality to increase the local response to heavy rain. The monitoring capacity is also raised.


Introduction
The development of Internet of Things (IoT) signals a shift in the resources of data.An increasing proportion of data collected today is generated by sensors.From this point of view, the public's urge for accurate environmental information may be sated by large-scale wireless sensor networks (WSN) [1] based on advanced Information Communication Technologies (ICT) infrastructures.Currently, contextualized and location-aware environmental sensor networks (ESN) [2] are the mainstream in this area.ESN is promising mainly due to the inexpensive embedded and system-on-chip hardware, convenient access to communication networks, and decreased cost for data storage.
In most existing ESN systems, the flood of sensorgenerated data pours into the back-end server without processing, which shoves heavy load onto network transmission.In harsh environments such as torrential rain and typhoon, short time network failure leads to serious paralysis in monitoring system.The database server may receive missing or corrupted data for long period of time.In that case, ESN loses the capabilities of recording and forecasting environmental changes, which are the original intentions of ESN.Therefore, it is desirable to learn the environmental information in the front-end, so that the ESN system can respond to different environmental situations with less help from the back-end.This new data processing system, which we call intelligent wireless sensor node, is the core of future ESN systems in front-end.With increasing intelligence in the embedded system that collects and learns from the sensor data, ESN's performance in predicting and capturing sudden extreme environmental changes will be improved.Also, the life-long training enables sensor node to reach better accuracy.
Case Study: Localized Torrential Rain (LTR) Monitoring.Torrential rain is a phenomenon that can significantly affect residents in the immediate surrounding of localized areas.Torrential rain can cause flooding and road closures, which significantly affects day-to-day travels in the flooded areas.Hence, early warning is desired for adjusting travel plans in advance.Localized torrential rain (LTR) is usually preceded by a series of changes in weather phenomena.Therefore, it is possible to predict LTR using the past and current meteorological information.When early warnings are published [3,4], local residents are able to make smart travel choices.LTR has been found especially challenging in environmental monitoring.In principle, if the movement of cumulus clouds is captured by satellite or radar, then the precipitation in a certain area can be estimated by analyzing the acquired images.However, this approach is inapplicable because current spatial and temporal resolutions of radar or satellite image are far below the request.As ESN has gradually replaced remote sensing approaches in applications such as air [5] and water [6] quality monitoring, it may outperform existing approaches for localized torrential rain monitoring [7][8][9].In fact, ESN has become the substitute of surface climate station for its high resolution and low cost.If we compare a city's ICT infrastructure to a person [10], there is no doubt that wireless sensor networks are the hidden neurons inside the body.Weather information obtained by sensors is usually transmitted from distributed wireless sensor nodes to back-end server, the clouds, where correlation analysis is conducted among all the received data [11].Some positive results have been achieved by ESN in collecting localized meteorological information.However, ESN lacks an intelligent solution for monitoring LTR in a timely, efficient, and automatic manner.

Raspberry Pi Based Intelligent Wireless Sensor Node for LTR Monitoring.
To offer an intelligent solution for timely, efficient, and automatic LTR monitoring, we introduce Raspberry Pi to be the wireless sensor node.Raspberry Pi is a well-known type of single-board computers (SBC), which has quickly occupied the embedded system market for its comprehensive abilities and low cost.Moreover, it gradually becomes the main products of environmental sensor information systems.However, it seems to be a waste of resources if we use SBC only for collecting and packaging sensor data; advanced data analysis can be expected to be done within SBC.
Inspired by the knee-jerk reflex, a well-known biological phenomenon, we developed an intelligent wireless sensor node for distributed heavy rain monitoring.The main idea is demonstrated in Figure 1.Multiple sensors stand out and sensory nerve, information such as temperature, humidity, solar radiation, and rainfall, is collected and transmitted to Raspberry Pi (the sensor neuron and spinal cord).Statistical and machine learning methods are implemented on Raspberry Pi to evaluate the connection between rainfall and environmental conditions.
Then, the probability of upcoming localized torrential rain is estimated and sent to back-end database and server.Also, clients in local area network will receive early warning of heavy rain.
Our major contributions in this paper are as follows: (i) Environmental information acquisition with high spatial and temporal resolution: the area of study is smaller than the minimum observation area that satellite can tell.
(ii) Techniques for reducing ESN's failure time: our system works in times when harsh environments influence the communication quality and paralyze current ESN.Precious weather conditions that other might miss are observed by our new sensor node.
(iii) Life-long learning and timely reaction with full use of prior knowledge: our system learns the labeled meteorological data using powerful unsupervised learning algorithms such as support vector machines (SVM) and predicts the probability of upcoming LTR with accuracy.
The reminder of this paper proceeds as follows.Section 2 presents the related works, which include WSN for localized heavy rain detection, Raspberry Pi for environmental monitoring, and probability estimation model for meteorological applications.Section 3 describes Raspberry Pi based intelligent torrential rain sensing system.The system architecture, multiple low-cost sensors, local computing kernel based on Raspberry Pi, and information processing software are introduced in this section.Section 4 provides the state-of-the-art information theoretical outlier detection method we utilized in our system.Section 5 presents probability estimation models for heavy rain, which involves data normalization, localized torrential rain correlation model, logistics regression, and multikernel support vector machines.Section 6 discusses the experiments and the results.Conclusion is given in Section 7.  [12] show that attenuation of Wi-Fi or microwave signals may have connections with heavy rain in nearby environment.Models like compressive sensing [13] are used in finding the signals related to heavy rain.Admittedly, SMMN fills the gaps between the minimum pixels of radar and satellite image, which enables us to identify rainfall in limited areas.However, the foundation of SMMN is stable and fast internet is required for its operation.All sensor information must be gathered and processed in background, which leaves uncertainty to the normal operation of sensors.Once the internet is disturbed by weather, sensor information may be lost.

Device: Raspberry Pi for Environmental Monitoring.
Since Raspberry Pi's appearance in the markets, plenty of attention is paid to its use in environmental issues.At first, Raspberry Pi is considered to be one of the alternative plans towards wireless sensor node in system design areas [14].Thanks to the assistance from open-source platforms and software, sensor system can be constructed by Raspberry Pi and Arduino.It is widely acknowledged that this combination is low-cost and available to all sorts of environmental sensors.
Also, Raspberry Pi plays an essential role in information processing on environmental relevant data.By extracting effective information from rainfall warning calls to authorities and weather posts online [15], Raspberry Pi provides useful information for background analysis.Experiments have shown that this method made full use of information ignored by most meteorologists and brought novel ideas to localized rain warning.
Nevertheless, existed applications of Raspberry Pi do not make full use of its competence in high performance computing.Especially when MATLAB offers its open source in Raspberry Pi, scientific computing algorithms can perfectly operate in embedded platforms.There is no doubt that complex algorithms' realization in open-source systems will receive more and more attention in the future.

Processing:
Outlier Detection on Sensor Data.Due to the flood of machine-generated data received in database systems, hidden outliers and anomalous value must be effectively detected in advanced database systems.According to [16], outlier detection methods have three major categories: supervised, unsupervised, and semisupervised approaches.The difference among these three categories lies in the availability of labels in training datasets [17].For example, supervised anomaly detection approach is close to supervised classification, which requires training data labeled as normal or abnormal.Popular supervised models like support vector machines (SVM) [18,19] consider training data as a point in a multidimensional space.Then, they select a half-space that contains most of the points prelabeled as normal.Any test data that falls outside the area is determined as outlier.Sometimes we just want either to model normality or under few circumstances to model abnormality [20,21].In that case, only normal class is taught but the abnormality can be recognized.These approaches are named as semisupervised outlier detection, which learns a model from a given normal dataset and calculates the likelihood of test objects.Both supervised and semisupervised detection methods are widely applicable when provided with large volume of training objects.However, the general current situation that at present data scientists face is the shortage of historical data.Advanced algorithms, which are named as unsupervised approaches, are required to determine the abnormality without prior knowledge.Unsupervised outlier detection becomes the constant challenges in recent years [22].Existing researches focus more on refining clustering algorithms such as Kneighbors [23], K-means [24], and K-methods [25] in order to meet the requirements for unsupervised detection.But most of those approaches are limited to numerical values.Only few outlier detection techniques are designed to process categorical information such as name, gender, and address.Also, owing to high dimensionality of datasets, complex statistical tests, and unnecessary approximation, the efficiency of these algorithms will suffer.Thus, new, general, efficient, and unsupervised outlier detection algorithms are required for big data analysis.Previous improvements have been made via introducing information theory concepts such as entropy, mutual information, and conditional mutual information to denote the outliers.These techniques expand the detection objects to both numerical and categorical data [26,27].Because the existence of outliers increases the overall entropy of certain data attributes, outlier can be modeled as constraints that impede datasets to reach their optimal entropy.Information theoretic techniques work well in quick outliers determination.

Learning: Rainfall Probability Estimation.
Several probability estimation methods have been proposed in the literature.Rainfall probability is the most critical part of meteorological prediction.Probability and frequency analysis of rainfall data derive the expected rainfall occurrence and thus help in better understanding of spatial-temporal rainfall pattern [28].Current rainfall prediction methods can be grouped into two categories: distribution fitting and classification approaches.The distribution fitting approaches select the best fit distribution models for annual, seasonal, and monthly rainfall time series based on values of statistical test [29].
However, rainfall probability is so changeable that it cannot be covered by certain distribution.Instead they are the result of combined factors.Therefore, logistic regression and SVM algorithms become major stream in rainfall probability estimation.Logistic regression was chosen and implemented on GIS system for quantitative prediction of rainfall and landslide in the study area [30].SVM have been proved to impact positively the prediction of the Indian summer monsoon rainfall [31].Also, hybrid models combining random forest (RF) and SVM have been used to predict amount of rainfall in rainfall occurring days [32].Accuracy achieved by SVM in this application shows its potential in rainfall probability estimation.Researchers may get better performance using SVM, especially with large-scale historical sensor dataset.Client lists will be refreshed periodically according to the interaction between the back-end database and the sensor nodes.

Raspberry Pi Based Localized Torrential Rain Sensing System
We now provide details of the three main parts of our ESN system, which are multiple low-cost sensors, local computing platform based on Raspberry Pi, and information processing software.Both fundamental hardware and local service software are introduced.All the energy our system needed is supported by green solar power or electrical power.

Multiple Low-Cost Sensors.
In order to capture the rapid climate changes, there is no doubt that an integrated sensor system must contain comprehensive sensors so that we can know the exact state in a localized area.The sensors need to be able to measure temperature, humidity, solar radiation, air pressure, wind speed, and rain gauge.Moreover, the sensors should be designed for long-time utilization and accurate in an acceptable range.As Table 1 shows, we choose a series of low-cost [33] and low-power meteorological sensors.Also our sensors share the same error ratio, so that influence brought by differences in error ranges is minimized.based intelligent algorithms can be implemented on Raspberry Pi, which bring the idea of our paper.

Local Computing Platforms
As Figure 3 shows, all sensors are connected to a RS485 bus.MODBUS protocol is applied on Raspberry Pi to start sensor and get data through a USB-RS485 converter.Raspberry Pi stores the received data and then conducts correlation analysis aiming at offering probabilities of upcoming heavy rain.After calculation, Raspberry Pi sends results to local registered clients through Wi-Fi (wireless fidelity) communication module and database server will receive data package through GPRS (General Packet Radio Service) module.

Information Processing
Software.The layers of our software are shown in Figure 4. From top to down, the obtained sensor data are managed by the data management module.It stores the sensor data in MySQL database on board and controls the sensor data exchanges between Raspberry Pi and the back-end server.The sensor data are interpreted by MATLAB database toolbox that is installed on Raspberry Pi [34].The data processing algorithm is the core of the whole software.Both logistic regression and support vector machine classifiers are available for MATLAB programs to train and test, which will be discussed in later part.Our programs compare the test results of several popular algorithms in period and select the best fitting model according to the accuracy in LTR prediction and efficiency.All following localized torrential rain estimation will be based on the model we chose.With the receiver list given by database server, Raspberry Pi sends probabilities of localized heavy rain to nearby clients through local Wi-Fi.The receiver list will be updated by back-end periodically according to clients' registration on our website.Also back-end server database can get sensor data stored on Raspberry Pi through its interaction with Raspberry Pi's data management system.Once the command is sent by back-end server, sensor data in certain period will be packed and transmitted from Raspberry Pi to back-end server through GPRS/protocols.[35,36].

Default Outlier Detection
In this section we present the outlier detection method that filters the original sensor data for following probability estimation on rainfall probability.We introduce and refine information theoretical outlier function developed by Wu and Wang [26] and apply the new function in anomaly identification on environmental sensor data.

Entropy and Mutual
where the set of possible values in   is called its domain and written in domain (  ), which is {V 1 , V 2 , . ..}.The mutual information is developed as follows to evaluate the correlation of heterogeneous information: where (  ,   ) is the joint entropy of   ,   which uses joint probability of   ,   to calculate entropy.

Information Theoretical Outlier Detection.
Frequency based algorithms cannot handle the massive informative datasets because the value of those attributes has too much variance.Therefore, deep research on the essence of outlier was conducted previously by Wu and Wang [26].In that paper, they derive the function of outliers using the differential holoentropy which takes both entropy and total correlation of data attributes into account.However, their work mainly focuses on detecting abnormal objects other than values, so they sum up data from different attributes and add weights to each value to make a combination factor for an object.After comparing outlier factors with threshold, outlier objects can be detected.This method shuts down the future opportunity for us to derive connections among outliers from various attributes.Therefore, in this section, we refine his model and focus on outlier detection within one attribute. where () represents the joint entropy of attributes in . and  are reciprocal values of the cardinality of original attributes and attributes without  , , and ( , ) denotes the times  , appears in the th attribute.
In our work, environmental data is fluctuated and highly correlated.The weights of different attributes are almost the same.Moreover, we want to preserve the normal value in certain objects instead of deleting the whole objects.So we refine the function by giving uniform weights and then focus on certain attributes rather than objects.The refined function is as follows: OF ( , ) . ( Now we have derived the function of outliers in one attribute.
In the next step, this function is applied to an outlier detection algorithm as in Algorithm 1.

Probability Estimation Model for LTR
In this part we discuss localized torrential rain estimation models utilized in our information processing software.After methods of default outlier detection, the meteorological data attributes must be normalized so that they can be processed by logistic regression and SVM model.Because meteorological data attributes are correlated according to past research [3], SVM kernels that model the data vectors by their lengths might reach a high accuracy level.Therefore, we introduce two kernels in order to get better performance.

Data Normalization.
Because of the disunity of dimension, all the collected data must be standardized before it can be used for analysis.There are two major methods of normalization below: Here   represents the results of standardization. min ,  max ,  mean , and   are the minimum, maximum, average, and variance of all received data, respectively.Because the variance calculation of data requires more computing time and the mean of rainfall data is less valuable in our discussion, we choose the first method as our standardization process.

Estimated LTR Probability
Definition 2 (estimated heavy rain probability).Generally, for a dataset  = [  ,  ℎ ,   ,   ,   ] with attributes as normalized temperature, humidity, air pressure, wind speed, and solar radiation, the estimated probability of heavy rain is as follows: where  = [  ,  ℎ ,   ,   ,   ] represents he weight for each attribute.
As for training samples, 1, rainfall more than 50 mm per day or 30 mm per 12 hours 0, otherwise.

Logistic Regression.
Logistic regression, which is used to estimate the probability of the binary response based on one or more variables, is considered to be one of the prevailing models in various economic and medical applications.When logistic function is established to measure the relationship between categorical dependent variable and independent variables, accuracy in categorization is promoted.In our paper, the logistic function is given by For later evaluation, Therefore, the heavy rain probability estimation problem has been transferred to linear models.From this point of view, our statistical hypothesis is defined as Wald test is also utilized for estimating parameters.We assign the Wald test model described as below: where   represents the predicted weights   ( = , , ℎ, , ) of vector  and SE  stands for its standard error.Given the level of significance , hypothesis  () 0 is denied by After vector  is fully estimated with affordable Wald test scores, logistic model for rainfall probability is accomplished.

Multikernel Support Vector Machines. Logistic regression
does not make full use of prior knowledge gained by historical data.Support vector machines (SVM) [31] outperform logistic regression in binary classification, especially with growing training dataset [32].When variables are mapped to higher dimensional space through divergent SVM kernels, nonlinear classification can be achieved by identifying maximum margin hyper plane between two sides [18,19].Because weather information like temperature and humidity is highly correlated, kernels that summarize the characteristics of different data transactions may perform better.We introduce two length based SVM kernels to estimate the probability of heavy rain.
Also we introduce the classic polynomial kernel for comparison.
Polynomial kernel: 6. Experiments 6.1.System Installation and Study Area.Our system is installed at the top of the main building on our campus.The whole intelligent sensor node serves as a meteorological station that characterizes the weather of Wuhan University.As Figure 5 shows, multiple sensors are supplied by solar panels and excess power will be stored in specialized batteries for later uses.Also the sensor node is connected to our building's Wi-Fi resources so that local clients (the students and staffs in the laboratory) can receive probability of heavy rain every 5 minutes.All the predictions will be sent to our laboratory (another building that is about 2 miles away through GPRS channels).The sensor platform was installed   6.
For rainfall forecast in small grid, European Centre for Medium-Range Weather Forecasts (ECMWF) is broadly applied [37].ECMWF ERA-interim provides 6-hour forecasts in any 0.75 × 0.75 (longitude and latitude coordinates) degrees' areas.In our experiment, our system and ECMWF predict rainfall in the same 0.75 × 0.75 degrees' region which centers around N 30 ∘ 32.25  E 114 ∘ 21.10  .Also, we have centralized WSN, which treats sensor system as a pure collector and relies heavily on networks.The results are compared and discussed in Section 6.3.

LTR Dataset.
To compare the efficiency of our ESN and ECMWF, we construct a LTR dataset for evaluation.The LTR dataset is a collection of environmental information in the study area, including rainfall, air pressure, temperature, humidity, and solar radiation, both from centralized WSN and our ESN (Table 2).Also, we have ECMWF forecast dataset that gives prediction to rain probability 4 times a day.These two datasets record the weather information and forecast from Sep 26, 2015, to June 15, 2016.In the LTR dataset, there are 25000 data objects that were randomly selected from Sep 26, 2015, to June 15, 2016.Each transaction contains rainfall, air pressure, temperature, humidity, and solar radiation data.They have been validated as correct.So we can use them as test dataset.Also as Table 3 shows there are 25000 training samples each for centralized WSN and our ESN.

Comparison and Discussion
. In this section, we conduct test to evaluate the effectiveness and efficiency of our ESN, centralized WSN and ECMWF.To test the working time of our ESN and centralized WSN, we compare their fault rate.For the accuracy test, we plot the accuracy of ECMWF, SVM, and logistic regression versus the size of training samples.When we find out that only SVM with our ESN beats ECMWF in LTR estimation, we compare the different SVM kernels in both accuracy and efficiency.

Efficiency between Our ESN and Centralized WSN.
Before the probability estimation, we must calculate the downtime when both our ESN and centralized WSN cannot transmit any information.Also default outlier detection method presented in the previous part will be applied on the original dataset.In our work, we use the fault rate as follows to represent the efficiency of the environmental monitoring system.
Definition 4 (ESN fault rate).Given a dataset , after downtime detection and outlier detection, it still has  valid transactions { 1 ,  2 , . . .,   }, which present the environmental information from our ESN and centralized WSN in  days (Table 4).The fault rate is defined as As  our ESN to deal with more LTR situations with more prior knowledge.

Performance of LTR Probability Estimation.
After evaluating the fault rate between two WSN architectures, the performances of our ESN, centralized WSN, and ECMWF are tested by LTR dataset.The training data transactions from both centralized WSN and our ESN are learned by linear SVM and logistic regression.In Figures 7 and 8, we plot the accuracy of ECMWF, SVM, and logistic regression on LTR classification versus the size of training samples.As Table 5 and Figures 7 and 8 imply, our ESN has higher accuracy than centralized WSN in classification.And linear SVM have better results than logistic regression in this problem because only the result of SVM from our ESN outperforms ECMWF.

Evaluation of Multiple Kernels on SVM Based Rainfall
Probability Estimation.In this subsection, the accuracy and efficiency in rainfall probability estimation are tested for SVM with various kernels.Because SVM have been shown to achieve higher accuracy in rainfall probability estimation, it is meaningful to compare the performance of SVM with different kernels.Also in LTR dataset, meteorological data  attributes are highly correlated so that length based SVM kernels may have better performance.
In Figure 9, we plot the accuracy of various SVM kernels on rainfall probability estimation versus the size of training samples.It is easy to find out that length based kernels estimate rainfall probability better than traditional SVM kernels like linear SVM and polynomial SVM.However, we can see from Figure 10 that length based SVM kernels cost much more time than linear and polynomial SVM kernels.Obviously there is a tradeoff between accuracy and efficiency.

Conclusion
Accurate and in-time localized torrential rain monitoring with prior data is a core challenge in environmental sensor networks.Despite a series of wireless sensor networks developed for environmental monitoring, current architecture relies too much on network transportation, which causes information loss and response delay when harsh environment conditions paralyze network.In this paper, we developed a Raspberry Pi based intelligent wireless sensor node that can estimate the probability of LTR according to the selfcollected environmental data and publish forecast in localized area using Wi-Fi.Our ESN outperforms centralized WSN in efficiency.Our ESN achieved higher efficiency and lower fault rate than centralized WSN.Moreover, SVM on our system enjoyed higher accuracy in LTR estimation.SVM on our system is tested to have higher accuracy in LTR estimation.

Figure 2 :
Figure 2: Enhancement in WSN architecture for localized torrential rain monitoring.

Table 1 :
Meteorological sensors produced by Fuyuan Technology Feike Electronic Company in Wuhan, China.

Figure 3 :Figure 4 :
Figure 3: Integrated sensor system based on Raspberry Pi.

Figure 5 :
Figure 5: Picture for system installation (map from Google Earth).

Figure 9 :
Figure 9: Results of SVM with different kernels on LTR.

Figure 10 :
Figure 10: Running time of SVM with different kernels on LTR.
Based on Raspberry Pi.

Table 3
shows, fromSep 26, 2015, to June 15, 2016, it is obvious that our system has lower fault rate, which implies that our ESN keeps working when traditional centralized WSN stops working.This improvement in efficiency enables