Based on Regular Expression Matching of Evaluation of the Task Performance in WSN: A Queue Theory Approach

Due to the limited resources of wireless sensor network, low efficiency of real-time communication scheduling, poor safety defects, and so forth, a queuing performance evaluation approach based on regular expression match is proposed, which is a method that consists of matching preprocessing phase, validation phase, and queuing model of performance evaluation phase. Firstly, the subset of related sequence is generated in preprocessing phase, guiding the validation phase distributed matching. Secondly, in the validation phase, the subset of features clustering, the compressed matching table is more convenient for distributed parallel matching. Finally, based on the queuing model, the sensor networks of task scheduling dynamic performance are evaluated. Experiments show that our approach ensures accurate matching and computational efficiency of more than 70%; it not only effectively detects data packets and access control, but also uses queuing method to determine the parameters of task scheduling in wireless sensor networks. The method for medium scale or large scale distributed wireless node has a good applicability.


Introduction
Most wireless sensor network (WSN) missions are to detect and environmental reporting events. Since wireless sensor networks usually work under severe environments, their performance is often difficult or impossible to assess accurately. Therefore, how to evaluate the performance in the wireless sensor network for task communication is the research emphasis in recent years, which becomes one of the most attractive.
Most current researches have focused on how to provide authentication, confidentiality, integrity, nonrepudiation, and access control ad hoc [1,2]. Distributed authentication is a very common approach to solve ad hoc security issues [3,4]. However, highly secure ad hoc approach in certain circumstances the lack of a common approach [5,6].
Because regular expressions provide excellent communication skills and flexibility, they have been widely used in a variety of network security applications, such as antivirus scanning, network intrusion detection and prevention systems [7], firewalls, and traffic classification and monitoring [8]. Deterministic finite automata (DFA) and nondeterministic finite automata (NFA) are the typical application of regular expressions. But this requires a certain store or timeconsuming cycle tolerable for resource-constrained wireless sensor condition is basically meeting the application requirements. Performance evaluation based on the communication task queuing model theory methods [9,10], in recent years, published several class methods.
However, previous methods suffer from the following disadvantages. In typical queuing model, customer arrival and the service processing are independent. However, they are relative to the sensor communication scheduling tasks. Sensors cannot receive communication and run the state machine at the same time. Usually the sensor is constant communication the occupied and communication task processing footprint. The arrival of the communication task has different priorities. The high-priority tasks can preempt the low-priority tasks, and the low-level tasks continue to run after the high-priority tasks processing is complete. Time constraints: the traditional system analysis often considered the average time while the task's communication between wireless sensors needs to consider the maximum time after which the time state machine will transmit to another state.
The method of finite automata has researched in the regular expression matching system security for the wireless 2 The Scientific World Journal sensor networks, and matching performance is ignored to further discussion [11]; similarly, the evaluation of the performance based on the queue model in the wireless sensor network (WSN) has been discussed, but the system security matching method not to do more research [12]; therefore, we combined with the previous work, in this paper, the security matching method of finite automata, and the performance of the queuing model was discussed in WSN.
The rest of the paper is organized as follows. Section 2 reviews regular expression two-stage matching strategy. Section 3 explains the regular expression matching approach. In Section 4, model of task scheduling based on queuing theory is proposed. In Section 5, the priority queue with two classes of tasks is proposed. We describe the performance of the wireless sensor communication tasks based on queuing theory in Section 6. In Section 7, simulations are conducted for illustrating the performance of our scheme. Finally, we conclude this paper in Section 8.

Regular Expression Two-Stage
Matching Strategy [11] Recent research has paid much attention to reduction of the huge memory usage for DFA-based regular expression matching, as DFA is the preferred representation of regular expression matching. As a matter of fact, they can only achieve memory reduction for specific regular expression or signature sets of simple. High-speed regular expression matching for real-world signature sets that contain thousands of complex regular expressions can be hardly achieved. In modern networking devices, TCAMs (off-the-shelf chips) have been widely deployed. However, even if techniques such as D2FA [13,14] are employed, tables of DFA and NFA are too big to be stored in TCAMs. In 2012, the RegexFilter (a highspeed and memory efficient technique) was presented by Liu et al. [15]. Regular expression matching was been sped up by quickly searching these regular expressions that may match each arriving item as little as possible. However, this method only cares about the profiteering stage and left the verifying stage without any optimization.

Profiteering Stage.
For instance, there is a regular expression set called ; another set is constructed so that any unmatched item of is also an unmatched item of . An item that does not match any regular expression in the set [15] is unmatched item of a regular expression set. Given an item , it will match against to get set ( ; ) firstly. If ( ; ) is empty, it does not obviously match any member in and therefore this item can be skipped safely; otherwise, matching it against ( ; ( ; )) will continue, where ( ; ) ⊆ ( ; ( ; )) ⊆ . Figure 1 shows the relationship between match items and print ( ). Because most items are unmatched and the match cost of 0 is much less than that of , the overall throughput of this approach can be much higher than directly matching against .

Verifying Stage.
In the verifying stage, how to build correlation from profiteering print and reduce the memory cost of DFA tables is the main point that needs to be handled. A DFA is presented by a 5-tuple ( ; ∑; ; 0 ; ) where is a set of states, is an alphabet, ∑ × → is the transition function, 0 is the start state, and ⊆ is a set of accepting states. The major part we should deal with is the DFA-based algorithms with the large amount of memory requirement to store the transition table. Software-based [16][17][18] and FPGAbased [19,20] regular expression matching algorithms are traditional approaches with many shortcomings. TCAM-based solutions have the advantages of easy encoding and high parallelism [13]. Three novel techniques, transition sharing, table consolidation, and variable striding, were proposed by Liu et al. to reduce TCAM space and improve matching speed.

Regular Expression Matching Approach
The selecting process of regular expression " [ ] : [ ]" with five atoms is shown in Figure 2. The parameter = 256 is the boundary we define and the expression size of every print should be less than . The selecting stage begins from the first atom. The curr pointer keeps moving to the next atom if ES( ) value of the regular expression print between the begin pointer and end pointer until the curr pointer arrives at the fourth atom "⋅", ES( [ ] :) = 1 * 2 * 1 * 256 = 512 > . Condition ES( ) < does not hold, and print [ ] is selected. Then a directed line from to to mark the correlation relationship is constructed. Then, in step 2, it is included in the already selected print " [ ] ", although [ ] satisfies the condition. According to section A, [ ] has higher matching probability (MP) than [ ] ; thus, [ ] is not selected. The same criteria are processed in steps 3, 4, and 5 to select print.
After selecting the print, a relationship of this graph called correlation sequence is generated as a directed graph from [1;...; ] to [1;...; ] .
Every package will be transmitted across certain nodes 1 ; 2 ; . . . ; according to ad hoc wireless protocol. These nodes will be grouped into two groups: one group for profiteering stage and the other group for verifying stage. Extra package fields are adopted to make each node work collaboratively and communicate with the other.      The package matching process is demonstrated in Figure 3. Taking the limited computing power of each wireless node into consideration, the calculation of ( ; is simplified to be addition only. At first, the feature vector needs to be stored so that the sum of can be calculated to get ( ; [1;...; ] ). The profiteered correlation sequence [1⋅⋅⋅ ] will be generated in node 2 after profiteering stage in node 1. Then, if [1⋅⋅⋅ ] is not empty, the ( ; [1;...; ] ) is calculated in node 2 by adding the feature vector . Verifying process will continue in node 2 using the group in its memory when ( ; [1;...; ] ) is larger than . Otherwise, the package will be transmitted to the next hop and will be matched continually.

Queue Model Description [12]
The pattern of communication between wireless sensors can be divided into two modes: the synchronous and the asynchronous modes. In synchronous mode, when a plurality of communication tasks are triggered, the tasks scheduling will be suspended. At this moment, the levels of query priority and processes priority are executed in sequence. This mode has a higher efficiency when the transmissions are not frequent. However, this will lead to an unacceptable high loss rate of the communication tasks when the transmissions are frequently triggered. In asynchronous mode, when the task to transmit, the scheduling will not immediately to process; however, the priority communication tasks are added to the queue in sequence, then the wireless sensor through state machine to fetch the head of the communication task in  IE  IE  IE  INI0  INI1  INIn   . . . the queue, and executes the task scheduling function. This model greatly reduces the tasks' loss, thus determining the wireless sensor network (WSN) which is formed in one of the biggest communication task captains that are of great help to guide sensor network design. Figure 4 shows -task communication scheduling based on the queuing theory model in the wireless sensor network. Input process: communication tasks are divided into levels. The first level has the highest priority, the second priority has secondary priority, and so forth. The level has the lowest priority. Assume that each communication task interval is the Poisson distribution or negative exponential distribution. The average time of the interval of the th level communication task is .
Queuing rules: the task responses as soon as the communication task arrival by the background task execution call service, when the task is not scheduling executed in the system. The high-priority communication tasks priority is executed in sequence until the end of all high-priority tasks in the queue. When the low-priority tasks are executing, the high-priority task takes over the low-priority task and the low-priority task will return to the queue, the same priority communication task followed by FCFS rues.
Service process: wireless sensor network uses the state machine to drive communication task process, assuming that each time of the communication task is exponential distribution, and the average service of level communication task rate is .
The WSN communication tasks queue performance parameters [21]: (a) absolute throughput is the average

Priority Queue with Two Classes of Tasks
Assume the queuing system is the preemptive priority. The th task arrival is Poisson distribution with parameter ; the service time is exponentially distributed with parameter ( = 1, 2, . . .). Level 1th priority communication task is more priority than level 2th priority task. The system state is = {( , ); 0 ⩽ , 0 ⩽ }, ( ) represents the 1(2) level of the communication tasks. The system state space distribution ( , ) = { , , 0 ⩽ , 0 ⩽ }. The system transition process is depicted in Figure 5.

Tasks Performance Indicators in Sensor Network
The priority tasks processing is as follows: when the 1st level task with parameters 1 and the 2nd level task with parameters 2 arrive, service times of two level tasks are the a Poisson distribution with the parameter and service time as a negative exponential distribution with the parameter ( = 1, 2). The 1st level tasks are more priority than the 2nd level communication tasks. Assume the parameters are 1 = 170, 2 = 300, 1 = 500, and 2 = 700.
(1) Steady-State Queue Length. Figure 6 shows the probability and the queue length as the 2 increasing. The vertical axis is the probability, and the horizontal axis is the queue length. In the probability matrix, = 0; the maximum queue length can be obtained in the system and assures that task's buffer is enough to calculate the task's loss.
In the simulation, = 40, which is the maximum queue length; when = 0, the arrivals of the task's probability are 0; this means the possibility of task arrival does not exist, and the length does not grow. Then, when = 0, the length can be regarded as the largest queue length.
(2) Average Sojourn Time. Assume that every level of task arrival is Poisson distribution. The th level tasks are with the parameter , the service time is the negative exponential distribution, and the average service time is 1/ . The average sojourn times 1 and 2 are calculated in the following formulas: (3) Average Waiting Time [22]. The same way, the queue of the average waiting times 1 and 2 is calculated as formula (1). [23]. is the probability of the wireless sensor being idle; then = 1 − , where is the occupancy probability of communication task [24,25]. The greater is, the greater the occupancy probability is. is the service capacity or the load capacity. To consider the practicality of the model, the queue length is not unlimited. To determine the performance indicators, we take the queuing model of M/M/1/N and set the buffer capacity which is .

(4) Wireless Sensor Usage Rate
(5) Task's Throughput. The communication task's time is divide into three parts: they are the task's processing time , the state machine processing time , and the wireless sensor idle time , + ≈ . The task's processing is more priority than the state machine processing. Assume the tasks service strength is ; then the tasks processing time is = , and the state machine processing time is = (1 − ) .
The processing of the finite automata is the queuing model of M/M/1/N; the input processing with the parameter and the service time with the parameter are the negative exponential distribution; length of the buffer is . The processing speed of the communication task is faster than the processing speed of the state machine. Thus, the actual processing capability of the state machine is = (1 − ) , and the buffer length is ; then the loss rate is In particular, = / . As to formula (1), the calculation which the state machine throughput computes is the following formula: When the sensor network is severely overloading, the ≫ , and the ≫ 1. As to formula (2), formula (5) is calculated, due to formula (5), and the speed of the parameter affects the performance of the task processing; the greater is, the lower the performance of the task processing is. When the communication tasks processing rate and the state machine processing rate remain unchanged, the performance curve is a straight line in which the slope is − / . Consider (6) Wireless Sensor Processing Capacity. Assume the processing capacity is of mips; the quantity of the tasks which need to be executed in the task's processing is 1 and the quantity of the tasks which need to be executed in the state machine is 2 ; then the task's processing capacity computes as formula (3); formula (6) is as follows: 6 The Scientific World Journal Take formula (6) into formula (3); the loss rate is Formula (7) is the relationship between the loss rate and the processing capacity. It helps to determine the requirements of the task's processing.

Experiment Set.
We evaluated our matching approach by regular expression sets extracted from two real-world systems named L7-Filter and Snort. L7-Filter is famous open source application layer traffic classier for Linux. The payload content of a flow and identified its application level protocol are reassembled through regular expression matching. Snort is a well-known open-source intrusion detection system, which can be configured to perform protocol analysis, probes, and content inspecting over online traffic by detecting a variety of worms. Two sets are chosen as = 1 ; 2 ; . . . ; to perform the experiments. The experiment parameters, ES, MP, are set, as shown in Table 1. Then, the local optimal value can be obtained during our experiments.
Print size denotes the memory occupation of prints after the profiteering stage. Group number is the group number of correlative regular expression according to the parameter . Average similarity is the average value of similarity in each group (see (2)). Package size is the testing packages length. Our simulation environment is based on NS-2 (Network Simulator, version 2). We set the number of nodes from 20 to 100. Number of suspicious package is the number of package that needs to be verified after the profiteering stage. Lastly, we calculate our experiment efficiency by their average executing cost: Efficiency = (Num of suspicious package/Total package Number) × ( ( )/Group Number). Table 2 demonstrates that we can get a good efficiency promotion from 73.17% to 89.73%. A regular matching comparison was performed with our strategy and normal approach. The average hop and average ( ) variation tendency and the number of nodes are shown in Figure 7.  a significant difference can be observed: when nodes are more than 30, the hops number of using decreases sharply. Figure 7 demonstrates that the matching approach can test and verify the packages efficiently when the number of wireless nodes is more than 30, which indicates that our approach can be well adapted to medium or large scale distributed wireless sensor network. On the other hand, there is no major difference in average hops when the system is handling a small group of wireless nodes. Comparing with other end-to-end strategies [9], our approach provides a well scalable way to construct intrusion detection system by integrating distributed wireless sensor nodes. Based on appropriate parameters, network attacks can be monitored by our system in an effective way.

Experiment of Queue
Model. The experiment compares two groups of the performance results, which computed by the queuing theory and got the results from the software NS2. The NS2 platform simulated the STM32W108 sensor networking; we found that it is affected with the following parameters: the average length of stay for communication tasks, the average queue waiting time of tasks, the occupancy rate, and the task throughput.  Simulation software configuration communication tasks take the high-priority traffic and low-priority communications task two categories. Design the task's scheduling function and assure 1 is approximately 400 operations and 2 is about 4000 operations. The program triggers communication according to the different experimental parameters and to test the influence on wireless sensor performance. The experiment of the results compares with calculation results of the method based on queuing theory to verify the creditability of the method. The statistics are shown in Table 3.
When the sensor overloaded, the sensor network handles the task for a long time, and the state machine processing has no support to the sensor network. The actual speed of sensor processing is far less than . When the wireless sensor network needed specific requirements of the loss rate and the sensor throughput, it can take the speed of the scheduling to the requirements.
For formula (7), it can understand the relationship between the loss rate and the capacity of scheduling. It can easily choose the right sensor in the network which will greatly improve the quality of service.

The Methods of Analysis.
The previous method of finite automata has discussed the regular expression matching system security in WSN, and the evaluation of the performance is ignored to research. The matching method in some extent can maintain the precision and accuracy in some systems; it can be used in a specific environment; the performance of the evaluation in the wireless sensor network (WSN) that we had researched is in the universal environment; it is necessary to consider the problem of the restrictions, such as the capacity of the buffer, the length of the queue in the processing time, the performance of the calculation that needs enough buffer for the task processing, and the work conditions. In our research, the approach which took the matching method and the evaluating performance together is a new topic. The method ensures maintaining the system security, reducing the loss rate of the communication task, and improving the accuracy of the schedule. The universal approach can be used