Adaptive Conflict-Free Optimization of Rule Sets for Network Security Packet Filtering Devices

.


Introduction
A key challenge of secure systems is the management of security policies, from high level ones down to the platform specific implementation.Security policies define constraints, limitations, and authorization on data handling and communications.The growth of communication links speed brings forward a need for improved performance of packet filtering devices, such as firewalls and secure Virtual Private Networks (S-VPN) gateways.To improve performance while maintaining consistency, network security policies should be tailored according to the network traffic.We address specifically computer based packet filtering devices that do not use hardware specialized filters (e.g., based on FPGAs) and refer to that vastly widespread sequential rule list model, which accounts for most common, computer-based filtering devices currently deployed.
The process of inspecting incoming packets and looking up the policy rule set for a match often results in CPU overload and packet delay or even loss.As a matter of fact, rule lists do not exceed few hundreds active rules in well-maintained, operational packet filtering devices.Packets that match high rank rules require a small computation time compared to those that require scanning the whole rule set.The processing load per packet becomes increasingly concerning as the input line speed increases and as packet filtering functions are assigned to a larger number of inexpensive, relatively simple devices.Having packets matching high rank rules is not so unlikely; for example, typically undesired or unpredicted traffic is essentially dealt with by the "deny all" rule.
In this paper, we pursue saving of CPU power by shaping the rule set onto the network traffic impacting the device.The idea is to give high priority to rules intercepting a large fraction of current traffic.Algorithms aiming at packet filter processing time improvements are presented in [1][2][3][4][5][6].The nontriviality of the optimization procedures is due to dependencies among rules, which puts constraints on rule reordering.Disregarding such dependencies can introduce inconsistency of policies implemented by the rule set of the devices.As reported in a number of works [7][8][9][10][11], conflicts among rules can cause holes in security, which are often hard to detect.
We develop an algorithm to solve the rule set optimization problem, under the constraint that the reordered rule set be conflict-free.Leveraging on this approach, already proposed, for example, in [5], we extend the optimization algorithm with the extraction of new rules from the "deny all" rule, in order to improve packet processing time further by capturing undesired packet flows that do not match any of the existing rules.The new rules are inserted in the rule set so as to maintain the optimization of the processing load with respect to the current traffic mix.The overall optimization procedure is named Adaptive Conflict-Free Optimization (ACO).Our test results prove that the extraction of rules from the "deny all" rule, as done in ACO, can improve CPU performance of packet filtering devices and it can reduce the impact of DoS (Denial of Service) and DDoS (Distributed Denial of Service) attacks.
We outline an adaptive procedure to automatically launch ACO according to traffic profile measured at device interfaces, aiming at striking a balance between device configuration updates and obtainable performance gains in a time varying environment, where traffic mix changes over time.Information on the network traffic mix is retrieved from log files collected by packet filtering devices.Using log files directly from the packet filtering device allows us to define an adaption algorithm that can be used for different kind of filtering, that is, whatever the fields exploited by rules are (e.g., header based or based on application level payload strings).
Our aim is to show how relatively simple means can gain a performance improvement without deeply affecting hardware and software of currently deployed devices, especially in the access networks, where their number is large and they are based on relative cheap, off-the-shelf machines.
The paper is organized as follows.In Section 2 we describe related works.In Section 3 we introduce the operational scenario and the software tools we have realized to run ACO.A detailed description of ACO algorithm is provided in Section 4. Section 5 outlines the algorithm that launches the ACO adaptively, according to the measured traffic mix.In Section 6 we describe experimental results based on a laboratory test-bed aiming at measuring ACO performance improvement and effectiveness against DoS attacks.Finally, we give some concluding remarks in Section 7.

Related Work
In [12] the Policy Core Information Model (PCIM) is described as an object-oriented model for representing policy information as extensions to the Common Information Model (CIM) activity within the Distributed Management Task Force (DMTF: http://www.dmtf.org/).The definition of policy and policy rule presented in PCIM and its extension shown in RFC-3198 [13] gave to Basile and Lioy [14] the starting point to refine these concepts in a way useful for a formal approach.Hari et al. [7] aim at detecting if firewall rules are correlated to each other, while in [8,9] a set of techniques and algorithms are defined to detect all policy conflicts.Along this line, [10] and [11] provide an automatic conflict resolution algorithm for a single firewall and a tuning algorithm for multiple cooperating firewalls, respectively.
In parallel, great emphasis has been placed on how to optimize packet filtering devices performance.The recent review in [15] offers a systematic comparison of trafficaware approaches to rule-based traffic filtering in security devices.In [16] a simple algorithm based on rule reordering is presented.This work describes rule dependencies using Directed Acyclic Graphs (DAGs), yet it does not provide a methodology to build the DAG of a given device.In addition the proposed algorithm is unfeasible in a real environments with large rule sets and complex graphs.Framework and methodologies to inspect and analyze both multidimensional firewall rules and traffic logs information are proposed in [1][2][3].In [1,2] the optimization tool uses current traffic characteristics to define rule set ordering so as to minimize the operational cost of the firewall.Four schemes are used to achieve this goal (hot caching, total reordering, default proxy, and online adaptation).In [3] an adaptive firewall optimization framework, named OPTWALL, is proposed; it is built to reflect the current traffic pattern into rule sets.A limit of [1][2][3] is that it is not defined when the update process must be started and the weight parameters used in the rule size estimation.The approach proposed in [4] optimizes the performance by rule reordering, but how to create the necessary statistics for rule weight estimation as well as how to find dependency relations between rules is not defined.In [5] an algorithm to optimize firewall performance is presented; it orders the rules according to their weights and considers two factors to determine the weight of a rule: rule frequency and recency which reflect the number and time of rule matching, respectively.They present two types of update: performance-based triggered update and timebased periodic update.We adopt a similar approach, also taking into account for the further optimization brought about by breaking up the default "deny-all" rule.Reference [6] presents a process of managing firewall policy rules, consisting of anomaly detection, generalization, and policy update using Association Rule Mining and frequency-based techniques.However, a complex distributed network with multiple firewalls and log acquisition are not contemplated.TCAM based fast packet classification is proposed in [17].However, TCAM are expensive and power hungry, as pointed out, for example, in [18].Efficient packet classification by means of an especially designed software is tackled in [19].In [18], after a wide review of many alternatives, Lim et al. propose and analyse the Boundary Cutting algorithm.It leads to a decision tree data structure that can be optimised to yield good search complexity even in very big rule lists (in the order of 100000 rules).A heuristic approach is explored in [20], by looking for a compromise between memory efficient trie data structures and search efficient decision trees.Detection of specific packets is considered in [21], where a randomised algorithm is considered: the emphasis here is placed on isolating specifically targeted packets from the mass of the wire traffic.Even though mere search performance can be quite improved by decision tree, still complexity, power consumption, and cost often call for simpler realization of packet filtering devices.So, adapting the rule list to the current traffic load remains a valid concept.Following that concept, an approach similar to ours, yet based on a more complex algorithm than the one we have developed, is defined in [22].In [23,24] different traffic-aware packet classification algorithms are defined, without considering specifically the traffic-adaptive optimization obtained by extracting detailed rules from the "deny all" rule.The rejection of massive undesired traffic is addressed in [25].Their approach can be seen as complementary to the one here proposed, based on the extraction of new rules from the "deny all" rule.
A third relevant and correlated issue is about the impact of the rule extraction from the deny all string.The few works on this topic [1,4] do not demonstrate if and in which cases this action benefits on CPU processing time.Moreover, those works do not detail how many rules should be extracted and according to which priority order.We give an extraction algorithm coupled with rule set optimization and demonstrate it can help relieving the effect of Denial of Service (DoS) attacks on the packet filtering devices.DoS attacks attempt to exhaust or disable access to resources at the victim.These resources are either network bandwidth, computing power, or operating system data structures.In flooding attacks, one or more attackers send streams of packets aimed at overwhelming link bandwidth or computing resources at the victim [26].This type of attack, defined in [27], can be really dangerous because it can be performed also by using many unaware sources of attack (Distributed DoS), so reaching huge diffusion and volume, as shown in [28], where a three-week analysis of a network is reported that found more than 12000 DoS attacks.In particular, we focused our attention on a flooding attack towards a firewall, aiming at making the packet filtering device collapse by means of a huge quantity of messages matching "deny all" rule.
Current packet filtering technologies exploit traffic adaptive mechanisms, as take-in access list in cache [29].In particular, the device stores a hash table whose entries match active packet flows and point at the corresponding rule/action of the rule set (cache association).This allows scanning the rule set only for the first packet of each active flow.Despite this method being adaptive to network traffic, its efficiency decreases when the size of the hash table grows.Moreover, this approach is ineffective with a large number of different undesired packet flows.
Finally, we give just a hint to different research directions on packet filtering devices.High speed packet filtering by means of specialized and optimized hardware is a prolific topic; for example, some recent works address the use of FPGAs (e.g., [30][31][32][33]).These works focus on optimized hardware design or matching rule searching techniques that can be conveniently implemented with FPGAs.Instead, in this work we assume a general purpose computer server is used to run the filtering machine, which is typical of access networks devices.Another approach focuses on defining an efficient compiler to produce optimized implementation of a high level policy list, to minimize match search complexity (e.g., see [34,35]).These works focus on optimization of the code implementing the filtering machine for the given list of rules, while our approach aims at optimally adapting the sorting of the rule list to the current analyzed traffic mix.These can be seen as complementary points of view.

System Architecture
3.1.Definitions and Notation.We assume that security policies are translated into an ordered list of predicates of the form:  → , where  is a condition and  is an action.We refer to predicates implementing security policies as rules.For security gateways and packet filtering devices, actions  that can be carried out on a packet are  or  (In IPSec gateways a third possible action is  for packets belonging to an activated security association needing to be encrypted and/or protected for authentication and integrity check.).The condition of a rule is obtained as the logical AND of a number of conditions of the type: "selector value from packet header/payload belongs to a given interval or set/matches the given string." For example, classic implementation of network level packet filtering devices considers five selectors: (1) PT =  , whose values can be represented by eight bit integers, that is, range between 0 and 2 8 −1; (2) SA =    and DA =   , whose values can be represented in dotted decimal notation and correspond to integers ranging from 0 up to 2 32 − 1 (for IPv4); (3) SP =   and DP =  , whose values can be represented by sixteen bit integers, that is, range between 0 and 2 16 − 1.
A condition is specified by giving an interval of values for each selector; that is, a condition can be viewed as an interval contained in the five-dimensional, finite lattice space  5 defined by Different selectors could be considered, possibly involving header fields belonging to other layers than network one, for example, application layer, or using strings taken from packet payload.For example, a URL can be used in the rule condition.The basic structure of the list as a sequence of rules does not change though.In the end, the predicates reduce to text strings or to numeric intervals.The selected fields of each packet are checked against the predicates to verify whether they correspond to the string value or are comprised within the interval range.
Given a rule set organized as an ordered list, each packet delivered to the packet filtering device interfaces is checked against each rule, following the rule ordering, until the first matching rule is found.Then, the action of the matching rule is applied.The last rule,  +1 , is usually a "deny all, " that is, a rule with wild-cards for each condition field.The "deny all" discards any packet that has not matched any previous rule, so it implements the principle that anything which is not explicitly allowed must be denied.We assume there is always a "deny all" at the bottom of the rule list.
The processing cost per packet is proportional to the depth of the matching rule.Hence, it can be reduced by reordering the rules according to the fraction of the input load that matches each rule, under the constraint of maintaining the dependencies among rules.The adaptation algorithm of a tagged device is triggered only when the analysis of the overall hit ratios of rules of that device points out that a significant shift of the aggregated traffic mix through the tagged device has taken place.The traffic mix is monitored through the logs produced by the device itself, as detailed in the ensuing subsection.

Networking Scenario.
The considered scenario is made up of packet filtering security devices deployed in a managed network.Network Management Systems (NMS) allow administrators to handle device configurations (rule lists) and to monitor packets flowing through devices using log messages collected and stored by the packet filtering device.
The overall architecture of the automated and adaptive policy management system that we have built up is depicted in Figure 1.The complete system comprises a policy conflict resolution tool, a log management infrastructure, and a tool that, based on log messages collected from all devices in the network, estimates rule matching ratios and triggers automatically and adaptively the rule set optimization process based on traffic statistics.The focus of this paper is on the optimization and adaptation part of the entire project.
All packet filtering devices, such as firewalls and security gateways, are set up to collect and send a log message reporting on packets they allow or deny as a normal part of their operations.We exploit this feature for ACO.The analysis of log messages allows us to figure out (i) real time traffic profile without using further devices such as network agents; (ii) how many rules are working and how many packets match with each rule.
A monitoring infrastructure is developed in order to collect and store log information into a log database (LogDB).In our testbed logs collected from devices are sent by using the "syslog" standard [36,37].Any other format could be used as well, provided it is "spoken" by both the device and the LogDB host. Figure 2 shows example data stored in LogDB.In particular, consider the following.
(i) IP address is retrieved from "syslog" packet.It identifies a device interface on the network.
(ii) Device type specifies rule list type; device could be configured with both FW and IPSec access list (this is an optional field).
(iii) Rule rank is the offset of the rule reported by the log with respect to the top of the list that the rule belongs to.
(iv) Count is number of packets that match that rule.
The optimization tool box in Figure 1 contains ACO algorithm.It retrieves the IP addresses of device interfaces to the networks and the device rule set from the DCDB.For each device ACO retrieves rule hit numbers from LogDB.Then it calculates rule weights and hence rule costs.These are the input parameters to the optimization algorithm (see Section 4).
Log centralization is the typical architecture used in current corporate and telcos networks.Our architecture aims at showing how to exploit log data collection of the NMS also to improve efficiency of packet filtering devices.Log reporting and updating of rule list are normally implemented functions and a LogDB is available in most networks independently of ACO.ACO exploits those functions for its own purposes, namely, to enhance packet filtering efficiency and harden them against DoS.
Packet filtering devices of the managed network are monitored and the ACO algorithm is started when at least one of the following events occurs: (i) rule set is modified by the administrator (such as rule insertion, modification, or removal); (ii) network traffic changes, that is, a new flow starts, or an existing flow varies its bit rate or terminates.
The first criterion is motivated mainly to check policy consistency and the second one to optimize performance adapting to traffic.We outline an algorithm for ACO automation in Section 5 specifically for this second situation.That is the part referred to by "Intelligent Decision Support System" in Figure 1.

Adaptive Conflict-Free Optimization (ACO) Algorithm Description
Let R = [ 1 , . . .,   ,  +1 ] be the ordered, conflict-free rule list, provided as input to ACO;  is the number of rules, besides the last rule,  +1 , which is assumed to be "deny all." ACO aims at minimizing packet processing times, under the constraint of maintaining a conflict-free rule list.For a detailed discussion and formalization of security policy conflicts in a rule list see [7,8,10].It suffices to say that, for a conflict-free list, any couple of rules in the list must be either disjoint or in an inclusive matching relation.is that whenever two rules, say   and   , have a nondisjoint domain, that is, there exists at least one packet that matches both of them, those two rules are said to be dependent and their ordering must be preserved as given in the input rule list.
The optimization process defines a new rule list R * , which includes rules   ,  = 1, . . ., , possibly reordered, and  +1 ("deny all") as the last rule.Further optimization is discussed in Sections 4.1 and 4.2, by merging into the rule list also rules extracted from  +1 .The optimized list R * must be equivalent to R under the point of view of security policy implementation.Formally, for each given packet  entering the device interfaces, the action performed by the device under R and R * must be the same.
In the following, the subscript of rules refers to their rank in the original rule list.Let   denote the rank of   in the (possibly reordered) list,  = 1, . . .,  + 1 (Since the "deny all" is always the last rule, it is  +! =  + 1; moreover, in the original rule list it is   = .).The rank   is proportional to the processing cost of matching   ; that is, for every packet matching   ,   tests are required to check all rules until   is hit.The weight of   is   =   /, where   is the number of packets hitting   and  is the overall number of packets received by the considered packet filtering device.The quantities  and   ,  = 1, . . ., , are obtained by collecting the device logs over an observation time interval.The discussion of how to adapt the weights   over time is given in Section 5.
The cost of   is therefore   ≡   ⋅   .The overall cost  R (y) of the list R for a given rule ranking y = [ 1 ⋅ ⋅ ⋅   ] (Any feasible ranking y is a permutation of the integer set ACO output is a rule set R * that minimizes the packet processing cost: under the constraint that the reordered list R * be conflict-free and equivalent to R; that is, if   and   ( < ) are dependent, it must be   <   .We can state the constraint in a way useful to the optimization algorithm by resorting to a Pseudo-Tree data structure describing the relationships among the rules, referred to as Device Pseudo-Tree (DPT) associated with the given rule list.An implicit definition of the DPT goes as follows: rule   is a child of rule   if and only if   ⊂   and there does not exist any rule   such that   ⊂   ⊂   for  ̸ =  ̸ = .Rules belonging to a conflict-free rule list, apart from the "deny all" rule, can be arranged in separate trees (possibly a single one) making up the DPT [10].In each tree of the DPT there is a root node which represents a rule that includes all the rules in the tree and there are one or more leaves which represent the most specific rules in the tree.Given the DPT associated with R, the constraint is checked by just requiring that no rule be assigned a rank smaller than its child rule(s); that is, scanning the list from top to bottom we must find any parent rule after its own descendant rules (i.e., the rules of the subtree rooted at the considered rule).Obviously, rules associated with disjoint subtrees of the DPT can be placed in any relative order.
The detailed steps of ACO algorithm are described in Appendix A. A full blown example of the procedure is developed in Appendix B.

Extracting Rules from Deny
All.If a high rate undesired flow matches the "deny all" rule, it can be convenient to extract a specific rule for that flow and place it at the optimum rank in the rule list.Extracted rules are always disjoint from all others in the rule set, so they do not cause additional conflicts and can be placed anywhere in the rule list.However, the inclusion of extracted rules does not necessarily improve performance from processing load point of view.
In this section we define an algorithm for rule extraction from the "deny all" rule.It starts by identifying the minimum set of rules that covers the space of the denied traffic.As outlined in Section 1, the condition  of a rule  :  →  corresponds to the interval of the five-dimensional lattice  5 described by the selector values specified in the rule condition .We denote the interval associated with rule   , with I(  ).
Let R be the set of indices of rules that are roots of the trees forming the DPT.The only rule more general than any   ,  ∈ R, is the "deny all" rule.So, for the nonredundancy of the rule list, the action associated with   ,  ∈ R, is necessarily allow.Then, the subspace A ⊂  5 comprising all allowed flows is given by Let us define D as the complementary space of A in  5 ; namely, D =  5 \ A. We are interested in the minimum partition of D into intervals; that is, where the intervals I( , ) are disjoint.This partition is not unique and can be obtained efficiently, for example, by using the same techniques as in ARC/PARC (Adaptive Resolution Classifier/Pruning ARC) min-max neurofuzzy classifiers [38].
Once the intervals I( , ) of the partition of D are found, we are given the list L  of extractable rules, L  = [ ,1 , . . .,  , ], to insert in the optimized rule set R * in order to achieve a reduction of device processing load.Only those rules from L  that lead to a significant processing effort saving are included into R * .This depends on the weight  , of  , , that is, the fraction of packets matching  , during the observation interval.
For each  , in L  , rule weight  , in the observed time interval  can be computed as  , =  , ⋅  +1 , where  , is the share of packets blocked by the "deny all" rule that match  , and  +1 is the "deny all" weight.We assume that the numbering of rules  , is arranged so that they are listed in order of decreasing values of ,  ,2 , . . .,  , ] be the new list.

Inserting Rule Extracted from Deny All
String.This phase consists of the insertion into R * of  rules (0 ≤  ≤ ), taken from L *  .Thanks to the all disjoint relations among the rules in L *  and among these rules and the ones in R * , the extracted rules  , can be inserted in any position of R * without generating conflicts.
Given the rule list R * , let its cost be When  , is inserted into R * with rank ℎ the following cost is obtained: Equation (7) shows that (R * ∪  , ) is a decreasing function of the weight  , for a given value of ℎ.So, to reap the maximum gain (cost reduction), insertion should start from the extracted rule with the biggest weight.Once the optimum insertion location for this rule is found, the second biggest weight extracted rule can be considered and so on.By virtue of the ordering of L *  , the insertion algorithm starts by considering  ,1 and finds a value for ℎ 1 , that is, the rank of  ,1 in R * ∪  ,1 , which minimizes the overall rule list cost.To achieve this goal we should perform an exhaustive search.If the obtained minimum cost is less than the cost of the original list R * , then R * is updated by adding the extracted rule  ,1 .The algorithm stores the updated list and its overall cost, and then it goes on evaluating the insertion of  ,2 and so on, until it evaluates the insertion of all  rules of L *  .As a result of the insertion of extracted rules, we obtain  expanded rule lists, R *  ≡ R * ∪ ,1 ∪⋅ ⋅ ⋅∪ , of length ++1 (including the "deny all" rule), where the  added rules have been assigned positions ℎ 1 , . . ., ℎ  , respectively,  = 1, . . ., .The corresponding costs are denoted as Γ  ≡ (R *  ); by extension, we set also Γ 0 ≡ (R * ).Since any benefit brought by the insertion of  , grows up with  , and rules of L *  are ordered by decreasing weights, the sequence of obtained costs {Γ  } 0≤≤ is unimodal; that is, it has a unique minimum, say for index ℎ * .Then ℎ * ≥ 0 is the optimum number of rules to be extracted from the "deny all" and it can be found at a cost linear with .

Traffic Driven Adaptation of ACO
The traffic mix at the input of a packet filtering device changes over time, so that each rule is matched by a varying number of packets as new traffic flows set on or running ones end up.The changing traffic mix impacts ACO since the weights   of the rule list cost function defined in Section 4 are just the fraction of packets matching rule   .
To address this issue we follow the same approach developed, for example, in [5,39].We define an adaptive, event driven mechanism to trigger running of ACO, including the rule extraction from "deny all." The key elements of our proposed mechanism are (i) collection of device log information; (ii) statistical testing based on log data, to estimate traffic mix variation over time; (iii) extraction of rule from "deny all, " provided the cost of the added rules is more than compensated by the processing gain.
The logic of ACO adaption is as follows.Let   be the last time that the rule list of the tagged device has been updated.Logs are collected from the packet filtering device, so that the management system can track the number   () of packets matching rule   ( = 1, . . ., ) and the overall number () of packets arrived at the device over the time interval of duration [  ,   +   ].The collection time   is defined so as enough logs are accumulated to evaluate a statistically reliable estimate of the packet traffic fractions matching each rule; that is,   () =   ()/(),  = 1, . . ., , and  +1 () = 1 − ∑  =1   ().The weight vector w() = [ 1 () ⋅ ⋅ ⋅  +1 ()] estimated at time   +   is compared to the previous one, w( − 1), that has been used to optimize the rule list at time   .We test the hypothesis that the two weight vectors are drawn from the same probability distribution, by using the Chi Square test.If the hypothesis is inconsistent with the data (i.e., there is statistically reliable evidence that the input traffic mix has changed) a new optimization of the rule list is run, by taking the new weights equal to w().On the contrary, a new collection period starts and the whole process repeats all over again.
The automation algorithm is run individually for each device.The processing can be centralized in a network management system, by downloading logs accumulated by the filtering devices and storing them into the LogDB.The Algorithm 1 summarizes the steps carried out by the ACO Decision Support System (ACO-DSS) to adapt the rule list according to the filtered traffic mix.The ACO-DSS samples the LogDB, to check whether the number of packets () listed in the collected logs for the considered device in the th sampling interval of duration   is larger than a threshold value 1.If that is the case, the Chi Square statistical test is performed.If the test detects that the traffic mix has changed, ACO is run, including extraction of rules from "deny all." The performance gain of the resulting optimised list is assessed and compared with a threshold 3.The new list is implemented if the performance gain is big enough.
The parameters 1 and sampling time   can be dimensioned based on the following guidelines.Let us consider the th sampling interval, drop the subscript  for simplicity, and let  be the probability that a packet belongs to a given flow.The unbiased, asymptotically consistent estimator of  is θ = /, where  is the number of packets belonging to that flow out of the  logs collected in the considered interval.The relative root mean square error of this estimator is E = RMSE( θ)/[ θ] = √(1 − )/() < 1/ √  ≈ 1/√.This can be made less than a given error  (we set 0.01), by taking  bigger than 1/ 2 (10000 in our case).Accurate estimates of traffic flow rates are required especially for the largest flows, those that have the biggest impact on processing resources of the device so that their filtering can be optimized most profitably.Let  be the fraction of the device max throughput  such that we want accurate estimates for those flows offering at least  pkts/s.Then, we should set   so that   ≥ 1/ 2 .For example, let  = 42000 pkts/s, as in Section 6, and let  = 0.05; that is, we aim at estimating accurately those flows whose rate is equal to or bigger than 5% of the device throughput.Then it must be 42000⋅0.05⋅ = 10000, whence   ≥ 4.65 s.Even if the input rate of the input packet flow is two orders of magnitude less than the example above, still the requirement on   would be in the order of some hundred seconds.The fine tuning of   should be carried out in the specific networking environment where the packet filtering device is deployed.This issue is further discussed at the end of this section.
The decision about traffic mix changing exploits the Chi Square test (CST), to determine if the current sample weight vector belongs to the same probability distribution as the previous one (see also [39]).The choice of the significance level , namely, the probability of false positive errors, is guided also by the observation that false positive errors are more critical than false negative errors.As a matter of fact, the latter implies that a real shift of traffic mix is overlooked: in that case all device rule lists stay the same so they might turn to be nonoptimized against the current traffic mix.In case of a false positive error, device rule lists would be updated (1) for V ← 1 to  V do (2) if (V)  then (3)   ← # logs matching rule   for device dev (4)  ← # logs collected for device V (5) if  ≥ 1 then (6)   ←   /,  = 1, . . ., (V) (7)  ← ∑
erroneously, since a traffic mix change is estimated whereas no actual change has occurred.The choice of the  level depends on error cost weighting of specific applications.We set  = 0.01.
Let   =   () be the number of logs matching rule   in the current observation interval and   =   ( − 1) the number of logs for which the rule list is currently optimized.The test variable is The null hypothesis is that the outcomes   are drawn from the same probability distribution as the   ,  = 1, . . .,  + 1.The test variable  in case of null hypothesis is asymptotically distributed as a Chi Square with  degrees of freedom for large sizes of the collected log sample.Hence  is compared with the Pearson threshold for the Chi Square test; namely, TH2 ≡  2 ,1− , the (1 − ) quantile of the Chi Square random variable with  degrees of freedom.
If  ≤ TH2, the null hypothesis is accepted and the traffic mix is deemed to be unchanged.Then, the logs gathered in the last observation interval are discarded.In case the traffic mix is estimated to have changed, ACO is run, including the extraction of rules from "deny all." The amounts of obtained performance improvement do not necessarily justify the upload of the new rule lists.They are sent to the devices only if there is enough performance improvement to be gained.This is realized by means of threshold TH3, expressing the minimum percentage cost reduction (costRed%) that triggers upload of the new configuration into the DCDB and hence to the filtering devices (TH3 = 5%).The choice of TH3 is a trade-off between the benefits of the optimization and the costs of the configuration upload.These costs may be of different type, for example, unavailability of the device for a certain period of time (reset on upload), security issues, or reduced device redundancy.Note also that the benefits of the optimization may vary depending on the network devices and traffic, which is why TH3 should be chosen according to the specific scenario in which ACO is deployed.
A critical point for ACO automation to be feasible is the expected time scale of traffic mix variation.That depends on the specific networking context.We address specific examples in the next section, where the traffic mix changes due to a DoS attack that introduced abruptly new packet flows in an attempt to saturate the input capacity of the packet filtering device.
As an example of "ordinary" traffic variation over time (not affected by DoS attacks), we show in Figure 3 two traffic measurements taken from a tier-1 level Italian ISP operational public network (input and output traffic profiles are plotted on the positive and negative ordinates, resp.).The top graph reports the http/https traffic impacting a web portal of a major company.The traffic profile refers to a single IP address/port number (80) and is plotted in units of packets/s.The bottom graph shows UDP traffic impacting an authoritative DNS server (unique IP address/port number 53).
In both cases, it is apparent that significant changes of the volume of traffic of each flow occur over a time scale in the order of hours.This provides the opportunity to relax the requirement on the observation time interval to collect a reliable statistical sample of logs and determine when a significant change occurs.It also relaxes the computational power requirements to run ACO.

Performance Evaluation of ACO
We carried out an experimental evaluation of the benefits of rule set optimization and rule extraction from "deny all." We set up a test-bed, outlined in Figure 4 and consisting of three Fast Ethernet subnets (physical link capacity: 100 Mbps).Two of them, net1 and net2, are connected by a single packet filtering device Amtec SAS 1000, referred to simply as "filtering device" in the following.The device rules are configured so that only traffic between net1 and net2 is allowed.Attacking flows originate from net3 and all of them match the "deny all" rule; hence they have the maximum possible processing cost.The filtering device used in the experiments runs many security functions (i.e., known attacks detection, activity logging), which makes the test-bed a close picture of a real operational environment yet it forbids simple mathematical modeling of CPU activity.So, we run black box tests and we take packet loss ratio and packet throughput of a tagged flow through the filtering device as key performance indicators.
In Section 6.1 we discuss tests aimed at evaluating benefits of rule set optimization on processing performance of a  packet filtering device.Section 6.2 deals with performance improvement by means of rule extraction from "deny all, " specifically benefits in rejecting Denial of Service attacks.

Effect of Rule Cost Optimization.
To evaluate the packet filtering device performance improvement obtainable as a function of the position of rules inside the list, we have generated UDP flow from net1 to net2 with a carried rate (throughput)  0 when the rank of the rule matching that flow is  0 .In Figure 5 we plot the throughput gain ( −  0 )/ 0 as a function of the processing cost reduction ( 0 − )/ 0 , as the matching rule rank is decreased from  0 to 1. Three different values of the inbound packet rates are considered.In all three considered cases, IP packets are 64 bytes long; it is  0 = 150 and  0 ranges between 3200 and 3428 packets/s (Some dispersion of numerical results of experiments is due to the well known burstiness of traffic generation by means of IPERF [40].).
The results in Figure 5 show that the percentage throughput improvement grows with packet rate.This is a useful feature of ACO, since the demand for lowering the processing cost arises, when the traffic intensity increases.On the contrary, the less the inbound packet rate is, the less the optimization benefit is.

DoS Rejection Capability via Extraction of Rules from
"Deny All".ACO can provide help in relieving the effect of DoS and DDoS attacks on the packet filtering devices.Denial of Service (possibly Distributed DoS) aims at overloading the CPU of the device by throwing a large amount of traffic on it, consisting of flows not envisaged in the policy design.These flows are discarded by virtue of the "deny all" rule, but this requires the entire list to be checked before a decision is taken on each packet.Even cache based accelerators can be ineffective, if a large number of different, undesired flows are thrown against a filtering device.That is not difficult to obtain, for example, by randomly changing source port, destination port, protocol type, or source address fields.ACO rule extraction from "deny all" can provide aggregated rules able to match the undesired traffic.Those rules can be merged in the rule list by the optimization procedure, so accounting for their weight in terms of matched packets.
ACO cannot be the only defence against DoS/DDoS attacks, especially when inbound link is saturated by anomalous traffic.In this case only the provider can definitely remove the effect of DoS/DDoS by disconnecting malicious sources of traffic.Despite that, we show that ACO is effective in detecting and reacting to DoS/DDoS attacks by relieving CPU load and protecting legitimate traffic.
Because of the limited number of associations that can be created and their single flow nature, cache based acceleration of processing works best with static traffic patterns.If a big surge of traffic made up of a large number of different and varying flows hits the filtering device, cache association is essentially ineffective.Extraction of rules from "deny all" as carried out in ACO aims at addressing this problem so as to complement the cache acceleration mechanism, by minimizing the time needed to match a packet.This is obtained by extracting maximally aggregated deny rules from the "deny all" and bringing them as close to the top of the rule set as dictated by the fraction of the inbound traffic hitting that rule.
The effectiveness of ACO is measured from a user point of view, as suggested in [41,42], by injecting into the security device an allowed flow and measuring its degradation under the DoS attack.The considered types of legitimate traffic in our test-bed are TCP and UDP flows, as in [43], and FTP transactions.To measure network performances we take the following key performance indicators: (i) long-term average net throughput for TCP and UDP; (ii) average file transfer speed (in Mbit/s) for FTP.
For each type of legitimate traffic we vary the DoS attacking flow bit rate from 1 Mbit/s up to 35 Mbit/s.According to a worst case scenario, we set the attacking flow packet size to 40 bytes, so that attacking flow packet rates range from 3124 packets/s for a bit rate of 1 Mbit/s up to 110655 packets/s for a bit rate of 35 Mbit/s.
Results are shown in Figures 6, 7, and 8 for TCP, UDP, and FTP traffic, respectively.Each experiment consists of launching a legitimate flow.Let  0 = 0 denote the start time of the experiment.All legitimate flows are set so that the filtering device processes them without any packet loss in case of no DoS attack.Performance worsening is only due to the onset of the attacking flow starting from time  1 = 200 s.At time  2 = 400 s ACO is run (The numerical values of these times are chosen to ease graph display; the reaction time of the automated ACO algorithm is in the orders of seconds; see Section 5.): a rule that captures the DoS flow is extracted from the "deny all" and the overall rule list is optimized as described in Section 4. The experiment run is stopped at time  3 = 600 s.
When the attack starts, performance of the legitimate flow degrades abruptly.After the extraction performed by ACO, it improves, in some cases getting back to the value observed prior to the attack.The legitimate flows react in different ways, according to the functionality of each protocol.For example, Figure 6 shows that TCP suffers major throughput loss even under a relatively mild attack (3124 pkts/s), due to TCP congestion window shrinking on packet loss detection.After ACO extraction of a rule filtering the attacking flow and optimization of the rule list, the device can process packets faster, thus reducing loss events and allowing TCP to attain a higher sending rate.UDP case is completely different (Figure 7), since there is no closed loop congestion control mechanism and datagram retransmission.In this case ACO extraction turns out to bring about a major performance improvement.The extraction phase of ACO is quite effective against DoS attack also in FTP case, as shown in Figure 8.
For each legitimate traffic and for each attack packet rate, we calculate the percentage improvement (PI) of the relevant performance indicator due to rule extraction from the "deny all" rule.PI of a given performance indicator  is defined as follows: where averaging the performance indicator from  = 400 s up to  = 600 s.In the setup of these experiments, we force the execution of ACO at time  = 400 s, to let the time for stable regime be reached both before and after ACO execution.
In Figure 9 PI of the average download speed is plotted for FTP as a function of the attacking flow packet rate .Other cases are qualitatively similar to this one.For DoS attack at packet rates lower than about 6500 pkts/s the obtained PI is very low, so in those cases ACO rule extraction is not really needed.For bigger values of the attack flow packet rate the PI grows reaching a maximum for  ≈ 62400 pkts/s and then it decreases somewhat, still hovering around 60%.Even under a heavy attack, performing ACO rule extraction and optimization allows users to download a file via FTP more than twice faster as compared to a nonoptimized rule list.
ACO can be also exploited against DDoS attacks, since the rules extracted from the "deny all" include aggregates of flows: they are actually the most general rules that cover the selector parameter subspace complementary with the subspace of allowed flows.So, a small set of rules can deal with all possible DDoS flows.When DDoS attack flows, possibly generated from different sources, match with a single extracted rule, the distributed attack is faced by ACO just as if it were a DoS attack from a single source.To demonstrate this robustness of our approach, we perform an experiment keeping the test methodology and network scenario same as before, except that three different attacking flows are generated in net3, originating from three different PCs.Attacking flows are such that a single extracted rule matches all attacking flows.For space reasons we do not show all DDoS test results, but just the PI for every legitimate flows (Table 1 performance gain up to about 60% (FTP case) with respect to the degradation due to the attack.

Conclusions
This work focuses on optimization techniques for packet filtering devices such as firewall and security gateways.The basis of our proposal is the reduction of the packet processing cost relying on traffic observed on the network.Our tool collects traffic information by means of logs, sent by the managed devices, and exploits them to reorder the device rule set.Furthermore, it creates new rules extracted from the "deny all" rule to match input traffic flows that are not captured by other rules.This last feature can be useful against DoS/DDoS attacks.We have implemented ACO in an experimental testbed and measured the effect of ACO.Results point out that rules reordering entails a tangible improvement of packet filtering device processing performance.We have also tested the anti-DoS functionality of ACO extraction phase, measuring the attacks impact on legitimate traffic, and we have demonstrated that, for attacks with packet rate higher than a critical value, extracting rules from "deny all" allows legitimate users under attack to reach a performance improvement between 30% and 60% in most cases.Let us now consider the merged list L. Let   ≥ 0 be the number of rules of L  that are placed in between  −1 and   ,  = 2, . . ., ; let  1 ≥ 0 be the number of rules of L  that are placed before  1 ; let  +1 ≥ 0 be the number of rules of L  that are placed after   .Similarly, let V  ≥ 0 be the number of rules of L  that are placed in between  −1 and   ,  = 2, . . ., ; let V 1 ≥ 0 be the number of rules of L  that are placed before  1 , and let V +1 ≥ 0 be the number of rules of L  that are placed after   .Note that  1 + ⋅ ⋅ ⋅ +  +1 =  and V 1 + ⋅ ⋅ ⋅ + V +1 = .Then the cost of L can be expressed as Figure 11: Device Pseudo-Tree associated with the rule list in Table 2. Red/green boxes denote / actions.

Deny all Deny all
Deny all 1 2 3 4  Starting from state (, ), if  , +  ,+1 <  , +  +1, , then   is selected; otherwise   is selected.The complexity of the algorithm is linear with  and .

B. Example of ACO Application
We develop a full blown example of application of ACO.An example of conflict-free rule list that can be fed as input to ACO is given in Table 2.
Reordering must respect rule dependencies to avoid introducing conflicts.For example, if rule  8 in Table 2 is brought to the top of the list because of its large cost, that creates a conflict with rule  7 , since the condition of  7 is included in the condition of  8 and their actions are opposite.
The DPT for the rule list in Table 2 is depicted in Figure 11.The "deny all" rule has been put on top of the DPT, as it is the most general rule.
The DPT of Figure 11 is used to optimize the rule list of Table 2 with the weights shown in the last column of Table 2. Figure 12 shows the optimization process in four steps (from left to right, from top to bottom).The final ordered, conflict-free, and optimized list is R * = As an example of how a list of rules extracted from "deny all" can be created, we refer to the rule list in Table 2.The correspondent DPT is shown in Figure 11.Table 3 illustrates the set of rules  , extractable from the "deny all" rule of the list in Table 2 and the associated normalized weights  , .So, the list L *  of candidate rules for extraction is L *  = [ ,1 ,  ,3 ], since  ,2 has 0 weight.
If we apply "deny all" rule extraction to the rule list of Table 2, by using the extracted rule set of Table 3, it turns out that there is no cost reduction.This is because rule extraction has a useful impact only if the number of packets matching "deny all" is a significant fraction of the overall packets dealt with by the filtering device.In the example of Table 2 "deny all" traffic accounts for just 10%.As another example, let us assume that the weight of "deny all" is  and other weights stay the same except they are scaled to make the sum of all weights equal to 1.The new weights are denoted with a tilde and are shown in Table 4.The last column of Table 4 reports the difference between the cost of the rule list R * and the one with rule  ,1 inserted with rank ℎ; namely, Δ(ℎ) ≡ (R * ∪ ,1 with rank ℎ)−(R * ).The weight of the extracted rule is  ,1 = 0.7, according to the first line of Table 3.It is easily found that the most convenient rank for  ,1 is ℎ = 1 for all  > 0.1792, which is the intersection point between the straight lines corresponding to the first and fifth rules.For example, for  = 0.25, the cost (R * ∪  ,1 with rank 1) = (R * ) − 0.575 where the cost of the list with no extracted rule is (R * ) = 5.46.

Figure 1 :
Figure 1: Overall architecture of the managed network system using the Adaptive Conflict-Free Optimization (ACO) module.

Figure 3 :Figure 4 :
Figure 3: Examples of traffic flow time variation of a flow packet rate (pkts/sec) in public operational networks (packets in: positive y/packets out: negative y): (a) http/https flow in a major company web portal; (b) DNS traffic towards an authoritative DNS server.

Figure 5 :
Figure 5: Filtering device throughput gain versus the tagged flow rule cost reduction for different values of the tagged flow inbound packet rate.

Figure 6 :
Figure 6: TCP sending rate sample path over 600 s, with DoS attack starting at time 200 s and ACO rule extraction and optimization carried out at time 400 s.

Figure 10 :
Figure 10: Graph for the optimization of the merging cost.

Figure 12 :
Figure 12: Optimization of the DPT of Figure 11 in four steps.
[  ] ≡ Value of []  ACO execution and [  ] ≡ Value of []  ACO execution.The two average values are taken over 200 s time intervals.[] is the average of the performance indicator from  = 200 s up to  = 400 s, whereas [  ] is computed byFigure 7: Sample path over 600 s of the ratio between sending and receiving rates of a UDP flow, with DoS attack starting at time 200 s and ACO rule extraction and optimization carried out at time 400 s.
). Attacking flows aggregate bit rates used in the experiments are as high as about 80 Mbit/s.Even against such powerful attacks, the provision of ACO rule extraction and optimization reaps a

Table 1 :
Percentage improvement (PI) of the efficiency parameter due to ACO rule extraction and optimization as a function of attacking flow packet rate, for TCP, UDP, and FTP legitimate flows.

Table 2 :
Example of rule list with  = 8 ( 9 is the "deny all" rule).

Table 3 :
List of rules extractable from the "deny all" rule of the rule set in Table2.