Intelligent Testing of Traffic Light Programs : Validation in Smart Mobility Scenarios

In smart cities, the use of intelligent automatic techniques to find efficient cycle programs of traffic lights is becoming an innovative front for traffic flow management. However, this automatic programming of traffic lights requires a validation process of the generated solutions, since they can affect the mobility (and security) of millions of citizens. In this paper, we propose a validation strategy based on genetic algorithms and feature models for the automatic generation of different traffic scenarios checking the robustness of traffic light cycle programs.We have concentrated on an extensive urban area in the city ofMalaga (in Spain), in which we validate a set of candidate cycle programs generated bymeans of four optimization algorithms: Particle SwarmOptimization for Traffic Lights, Differential Evolution for Traffic Lights, random search, and Sumo Cycle Program Generator. We can test the cycles of traffic lights considering the different states of the city, weather, congestion, driver expertise, vehicle’s features, and so forth, but prioritizing the most relevant scenarios among a large and varied set of them. The improvement achieved in solution quality is remarkable, especially for CO 2 emissions, in which we have obtained a reduction of 126.99% compared with the experts’ solutions.


Introduction
Nowadays, all initiatives for the development of the Smart City [1][2][3] focus on public institutions for the impact they have on society in general.A Smart City is a holistic initiative to manage the city where actions and services, based on information technology, are optimally conceived and deployed for sustainable living.Traffic flow management is one of the most important aspects in the context of smart cities due to the large number of vehicles to be managed in the metropolitan area.An optimal management of traffic might be beneficial to minimize journey times and reduce fuel consumption and harmful emissions.For this purpose, the cycle programming of traffic lights constitutes a key task [4].Nevertheless, the large number of cycle combinations that appear in current traffic light schedules require automatic intelligent tools to be used by the experts in this field.
In this regard, several research studies on automatic traffic light scheduling exist which use different intelligent techniques such as fuzzy logic controllers [5,6], multiagent systems [7], neural networks [8], and metaheuristic algorithms [9][10][11].However, these solutions entail a high dependency on the scenario instances under examination, with specific conditions and limited variability.Furthermore, the variability of the traffic system is due, among other factors of varying importance, to different weather conditions, the daily traffic versus the weekend's, rush hours, changing environment, and so forth.Therefore, our main objective in this paper is to offer a validation strategy for the generated cycle programs of different and variable traffic scenarios, since they could determine the robustness of the proposed cycle programs of traffic lights for the experts.
With the aim of representing high variability systems (like vehicular traffic scenarios), feature models (FMs) [12] have emerged in the last decade as a standard strategy for modeling common and variable features of a system and their relationships.FMs have been applied to test Software Product Lines (SPL) [13][14][15] with great success, although they have still not been considered for representing features in other domains.Mindful of this, our motivation is therefore to develop a model for traffic management, with the target being to represent all feature variability for the generation of different scenarios of urban mobility.In this regard, we aim to extract validated information about which traffic lights program (from among a number of them) is best adapted for most traffic scenarios with different feature combinations and which fails to manage the traffic flow with other specific feature combinations.
In fact, among the main features that affect traffic flow, we can enumerate the following: different meteorology conditions, type of vehicle, driver expertise, simulation time, and so forth.However, the analysis of all possible traffic scenarios is inviable due to the large number of feature combinations to take into account.For this reason, we propose an evolutionary approach, called the Prioritized Genetic Feature Model (PGFM) algorithm, designed for the generation of a minimal prioritized test suite of traffic scenarios, which meet a coverage criterion (pairwise).The pairwise coverage criterion [16] is the most popular in the literature.Since all scenarios are not equally important, our feature model has been extended to provide priorities for the features.Therefore, in the case there are time or cost requirements, this prioritization allows us to first generate the most important scenarios, in which we test the traffic lights programs.
In this paper, we focus on an extensive urban area of Malaga city (in Spain) for the case study.We use the wellknown SUMO (Simulator of Urban Mobility) [17] traffic simulator, which offers a continuous source of information about the vehicle's flow: velocity, fuel consumption, emissions, journey time, and so forth, giving rise to configurations of realistic scenarios according to real patterns of mobility.In particular, we validate a large number of traffic light programs previously generated by four optimization algorithms: two metaheuristic techniques, PSOTL [9] and DETL [9]; a blind stochastic search algorithm, random search (RS); and a deterministic procedure following human experts' rules for traffic configuration, SCPG [17].Our present work will help determine the suitability of the computed configurations for a wider set of city scenarios for which they have not been trained.The main contributions of this paper are summarized as follows: (i) Design and application of a prioritized feature model (PFM) for the representation of traffic light management systems: as far as we know, this is the first application of a PFM to traffic light management systems.It constitutes the actual use of cross-fertilizing techniques, since we take advantage of prioritized feature models, typically applied to software testing, to leverage traffic and environmental solutions in current smart cities (like Malaga).This model represents traffic variability through different features weighted according to their probability of occurrence and importance.This testing technique is useful here to identify which cycle programs fail to manage the traffic flow in different scenarios.
(ii) Development of Prioritized Genetic algorithm for feature models (PGFM), for the automatic generation of weighted traffic scenarios: PGFM is shown here to comply with the pairwise covering criterion [16].In this way, we are able to work with a reduced set of scenarios that cover all the different traffic features considered in our PFM.
(iii) Thorough analysis and comparison of the different traffic light programs for the Malaga city case study: these cycle programs were generated as candidate solutions by means of four optimization algorithms: PSOTL, DETL, RS, and SCPG (human experts' solutions).Although these solutions were obtained for a specific traffic scenario, they have been validated for multiple scenarios covering our PFM.
The remainder of the paper is organized as follows.Section 2 presents an overview of related work in the literature.In Section 3, basic concepts are explained for the sake of a better understanding of the work presented here.After this, Section 4 details the validation strategy.Section 5 describes how traffic scenario features are represented by means of our feature model.Section 6 outlines the PGFM algorithm and presents the generated test suite of traffic scenarios.Then, Section 7 is devoted to presenting the Malaga city case study.In Section 8, the experimental procedure and the analysis of results are addressed.In Section 9, we provide a deeper analysis, focusing on the prioritization and the ranking of the solutions.Section 10 discuses the threats to validity.Finally, Section 11 outlines some concluding remarks and future work.

Related Work
This section presents an overview of related work considering a two-pronged approach: combinatorial testing for feature models and optimization of traffic light cycle programs.The former focuses on intelligent techniques for combinatorial interaction testing (CIT), using covering arrays.The latter considers the use of metaheuristics for traffic optimization.
CIT [16] is an effective testing approach for detecting failures caused by certain combinations of components or input values.Generally, this task consists of generating, at least, all possible combinations of the parameters' values (this task is NP-hard [18]).A large number of CIT approaches have already been published.A good overview and classification of approaches can be found in [19,20] and more recently in [21].In addition, an extensive survey that focuses on CIT with constraints is also given in [22].There are only two approaches (to the best of our knowledge) that support prioritized test data generation: the Deterministic Density Algorithm (DDA) presented in [23] and an approach based on Binary Decision Diagrams (BDD) [24].However, neither of them have been applied to feature models prioritization.So, here we use the priorities of the traffic features to generate prioritized scenarios for testing traffic light programs.
Except for a few efforts, the application of bioinspired techniques to combinatorial interaction testing with feature models remains largely unexplored.Garvin et al. applied simulated annealing to CIT for computing n-wise coverage for feature models [25].Ensan et al. also propose a genetic algorithm approach for test case generation [26].In contrast with the proposed work here, they use as fitness function a variation of cyclomatic complexity metric adapted to feature models; their goal is not n-wise coverage and they do not consider priorities.Henard et al. propose an approach that uses a dissimilarity metric that favors individuals whose nwise coverage varies the most from the current population thus increasing the chances of wider coverage [27].A key difference with this work is that the prioritization of them is not based on assigned weights that have a real value.Recent surveys on feature models [28,29], attest to the increasing relevance of the topic within the community but also confirm that the latent potential of search based testing techniques remains largely untapped.
The generation of test suites from feature models has recently been examined in several studies.Oster et al. proposed MoSo-PoLiTe [30], an approach that translates feature models and their constraints into binary constraint solver problems (CSP), from which they compute pairwise covering arrays.Hervieu et al. developed a tool called PACOGEN that also relies on constraint programming for computing pairwise coverage from feature models [31].Johansen et al. [15] proposed a greedy approach to generate n-wise test suites that adapts Chvátal's with the aim of solving the set cover problem.In the work presented here, a test suite is generated, although it is composed of the different traffic scenarios where traffic light programs are tested.
In terms of traffic optimization, metaheuristic algorithms [32] have become very popular for solving traffic light staging problems.A first attempt was proposed by Rouphail et al. [33], where a genetic algorithm (GA) was coupled with the CORSIM [34] microsimulator for the timing optimization of nine intersections in the city of Chicago (USA).Following the model proposed in Brockfeld et al. [35], Sánchez et al. [10] designed a GA with the objective of optimizing the cycle programming of traffic lights in a commercial area in the city of Santa Cruz de Tenerife (Spain).In that approach, the computation of valid states was done before the algorithm began, and it highly depended on the scenario instance tackled.A GA was also used by Turky et al. [36] to improve the performance of traffic lights and pedestrian crossing control in a single four-way, two-lane intersection.
Peng et al. [37] presented a particle swarm optimization (PSO) with isolation niches for the scheduling of traffic lights.In that approach, a purely academic instance with a restrictive one-way road with two intersections was used to test the proposal.Kachroudi and Bhouri [38] applied a multiobjective version of PSO for optimizing cycle programs using a predictive control model based on a public transport progression model.In that approach, private and public vehicle models were used to carry out simulations on a virtual urban road network made up of 16 intersections and 51 links.Kesur [39] used the nondominated sorting genetic algorithm (NSGA-II) to evaluate two confront objectives: delay and number of stops.This work focuses on evaluating different modifications of the NSGA-II algorithm.In addition, the traffic model used was the microscopic stochastic traffic network simulation proposed by him.García-Nieto et al. [9,40] proposed a PSO approach with the SUMO microsimulator for traffic light programming in large realistic urban areas, like Malaga and Seville (Spain) and Bahía Blanca (Argentina).For all the scenarios, this approach considered hundreds of traffic lights and circulating vehicles.In addition, these last works [9,40] evaluated other tools like SCOOT that uses different traffic light schemes of scheduling on different instances adapted to very specific use cases.Performing comparative studies with these tools is out of the scope of the proposed work here, where the main goal is the validation of traffic light plans in certain urban area under different environmental conditions.
In summary, the aforementioned approaches have focused on different aspects of traffic light programming.However, a common limitation in all of them is that, in the optimization phase, they considered just one scenario with specific traffic conditions.This means that the resulting cycle programs could be strongly biased to the optimized scenario instances, so even a slight change in the traffic conditions (e.g., weather change) might suppose that the cycle program does not work properly.
In contrast, our initiative stands for cross-fertilization of the two domains involved here: Feature Modeling and Traffic Lights Optimization.Therefore, we consider a number of different traffic conditions in scenarios, covering all possible features according to our designed feature model, so, from now on by means of our approach, the cycle programs will be empirically validated, thereby providing the experts in traffic management with robust and generalist solutions.

Background
In this work, we do not focus on the simulation model; hence a comparison of simulation algorithms is out of the scope of this study, although we do use one simulator (SUMO simulator) to measure the quality of the traffic light programs.We focus on the optimization and validation models.In this regard, the validation model has been specified as general as possible in order to be used with multiple general purpose optimization algorithms.With this aim, we have tested two metaheuristic algorithms: PSO and DE, a stochastic random search algorithm, and a deterministic traffic scheduler, SCPG.
Before describing our validation strategy, several required concepts are introduced: the SUMO traffic simulator, which is the simulator used to measure the quality of the solution proposed, the cycle program optimization of traffic lights program, the optimization algorithms used to generate traffic lights programs, and the feature models proposed here.

SUMO Traffic Simulator. SUMO (Simulator of Urban
Mobility) [17] is a well-known traffic simulator that provides an open source, highly portable, and microscopic road traffic simulation tool designed to handle large road scenarios.SUMO requires several input files that contain information about the traffic and the streets to be simulated.A journey is a vehicle movement from one location to another defined by the starting edge (street), the destination edge, and the departure time.A route is an extended journey, meaning that a route definition contains not only the first and last edges but also all the edges the vehicle will pass through.Additional files can be added to SUMO containing the map or traffic light positions and cycles.It is important to note that SUMO by default provides the valid combination of states that the traffic light controller can go through inside the map specification file, and an approximation of interval times for these states [41].This means that SUMO already incorporates a solver (SCPG) for the cycle program of traffic lights based on greedy and human knowledge.
The output of a SUMO simulation is registered in a journey information file that contains information about each vehicle's departure time, the time the vehicle waited to set off (offset), the time the vehicle arrived, the duration of its journey, and the number of steps in which the vehicle speed was below 0.1 m/s (temporal stops in driving).Other output files gather information about emission traces in vehicles (CO 2 , NO  , PM, etc.) and hydrocarbon consumption.This information is used to evaluate the quality of alternative traffic light cycle programs.

Cycle Program Optimization of Traffic Lights.
The objective in this problem is to find optimized cycle (timing) programs for all the traffic lights located in a given urban area with the aim of reducing the global journey time, emissions, and fuel consumption.Specifically, cycle programs are the time span that a set of traffic lights (at an intersection) keep their color states.This is the first step of our work: obtaining configurations of the city traffic lights optimizing these metrics; the second is to find the best of them in terms of generalization for different city conditions (weather, traffic, driver, and vehicle conditions).
An example of this mechanism can be observed in Figure 1, where the intersection with id = "i" contains seven phases with durations of 40,5,40,10,36,6, and 22 seconds.In these phases, the states have twelve signals (colors), each one of them corresponding to one of the twelve signal lights located in the intersection being studied (dashed circles indicate where traffic lights are located in the real intersection).These states are the valid ones generated by SUMO [42] obeying real traffic rules.In this instance, the fifth phase contains the state "Grr GGGr rrG GG" meaning that seven traffic lights are green () and the other five are red () for 36 seconds.The following phase changes the state of the traffic lights to another valid combination, for example, "yGG yyyG GGy yy" ( means yellow) for 6 seconds.The last phase is followed by the first one, and this cycle is repeated for the entire analysis.All the intersections in the complete scenario perform their own cycles of phases at the same time, thereby comprising the global schedule of signal lights.As mentioned before, computing the Signal Light Timing Program (SLTP) is based on optimizing the combination of phase durations of all traffic lights (in all intersections) with the intention of improving the global flow of vehicles.
A formal definition of the optimal SLTP is as follows.
Let  = { 1 , . . .,   } be a set of intersections, where each one has a different set of phases   = { 1 , . . .,   } and each   represents the timespan that the set of traffic lights in intersection   keep one valid state of light colors (e.g., "Grr GGGr rrG GG"); find a program   that minimizes a scoring function Θ : Γ →  such that where Γ is the space of all possible combinations of cycle programs and  a given program of Γ.
Since the timespans of phase durations are calculated in seconds (as done in real traffic lights), Γ can be represented with a tuple of positive integer numbers Z + .Then, the number of possible program combinations, that is, the solution space size, can be calculated as  ∑  =1 |  | ,  being the value in the interval of possible timespans.This way, as intervals were set to  = [5,60] and the worked instances had 304 phases (at least), the problem solution space would consist of 55 304 = 1.18 ⋅ 10 529 combinations.Therefore, efficient automated approaches are required to tackle it.The number of phases (304) is automatically computed according to the traffic model by the SUMO simulator when a problem instance is generated.

Optimization Algorithms for Cycle Programming.
In this subsection we briefly describe the algorithms proposed for the optimization of cycle programs of traffic lights.

3.3.1.
PSOTL.This is a particle swarm optimization algorithm for finding quasi-optimal cycle programs for traffic lights.In PSOTL, the initial swarm is composed of a number of particles (solutions) initialized with a set of random values representing the phase durations.These values are within the time interval [5,60] ⊆ Z + and constitute the range of possible time spans (in seconds) a traffic light can be kept on a signal color (only green or red, the time for yellow is a constant value).This interval is specified to follow several examples of real traffic light programs provided by Malaga's City Council (Spain).
Since the optimal SLTP requires solutions encoded with a vector of integers (representing phase durations in seconds), we have used the quantisation method provided in the standard specification of PSO 2011 [43].This quantisation is applied to each new generated particle and transforms the continuous values of particles to discrete ones.It consists of a Mid-Thread uniform quantiser as specified in (2).The quantum step is set here to Δ = 0.5.Consider Algorithm 1 describes the pseudocode of PSOTL.The algorithm starts by initializing the swarm (Line (1)), which includes both positions and velocities of the particles.The corresponding personal best   of each particle is randomly initialized, and the leader  is computed as the best particle of the swarm.Then, for a maximum number of iterations, each particle is updated (Line (4)), quantised (Line ( 5)), and evaluated (Line ( 6)), according to a scenario instance (city).At the end of each iteration, leader  is also updated (Line (8)).Finally, the best solution (cycle program in individual ) found so far is returned.

DETL.
This algorithm also performs a populationbased search, but following in this case the operation scheme as specified by the Differential Evolution algorithm (version DE/rand/1) [44].
As shown in Algorithm 2, after initializing the population in (Line (1)), the individuals evolve for a number of iterations performing differential operators (Line ( 4)).After this, solutions are quantised (Line ( 5)) and evaluated (Line ( 6)), according to a scenario instance ().At the end of each iteration, each particle is either selected or not (Line ( 7)) depending on whether it outperforms the previous one in the evolution procedure.Finally, the best solution (cycle program in particle V) found so far is returned.

RS.
Random search is included here not as a serious competitor but as a sanity check to find out whether an intelligent algorithm is actually needed.In RS, at each iteration step, a new solution vector of integer variables is randomly generated (uniformly) in the range of [5,60].The new individual replaces the previous one if it is better.

SCPG.
This is a deterministic algorithm provided by the SUMO [42] package for generating cycle programs.This technique incorporates actual scheduling information used by human experts in the domain.Cycle programs generated by SCPG are actually running our city traffic light systems at present.This algorithm consists of assigning fresh values in the range [6,31] to the phase durations according to three different factors: (1) the proportion of green states in the phases, (2) the number of incoming lanes into the intersection, (3) the braking time of the vehicles approaching the traffic lights.
3.4.Feature Models.Feature models (FMs) provide a way to make a compact representation for modeling the common and variable features of a system, their relationships, and the constraints between them.FM can be understood as a tree of concepts, where the nodes are the features (which are depicted as labeled boxes), and the edges represent the relationships between them.Thus, an FM denotes the set of feature combinations.In an FM, each feature (except the root) has one parent feature and can have a set of child features.Note here that a child feature can only be included in a feature combination of a valid product if its parent is included as  well.The root feature is always included.In order to illustrate these concepts, in Figure 2 we show an instance from the SPLOT repository [45] of a basic car.This instance defines 24 different configurations combinations and four kinds of feature relationships: (i) Mandatory features are depicted as filled circles.
A mandatory feature is selected whenever its respective parent feature is selected.Features called Transmission (Ts), Car Body (Cb), and Engine (Eg) are mandatory.
(ii) Optional features are depicted with an empty circle.
An optional feature may or may not be selected if its respective parent feature is selected.Features Air Conditioning (Ac) and GPS (Gp) are optional.
(iii) Exclusive-or relationships are depicted as empty circles crossed by a set of lines connecting a parent feature with its child features.They indicate that exactly one of the features in the exclusive-or group must be selected whenever the parent feature is selected.If feature Transmission (Ts) is selected, then either feature Manual (Ma) or feature Automatic (Au) must be selected.
(iv) Inclusive-or relationships are depicted as filled circles crossed by a set of lines connecting a parent feature with its child features.They indicate that at least one of the features in the inclusive-or group must be selected if the parent is selected.If feature Engine is selected then at least one of the features Electric (El) or Gasoline (Ga) must be selected.
Besides the parent-child relationships, features can also relate across different branches of the FM with the so-called Cross-Tree Constraints (CTC).These constraints, as well as those implied by the hierarchical relationships between features, are usually expressed and checked using propositional logic [46].These FMs could be extended to take into account feature priority.In our example, we indicate the weight associated with the optional features as a percentage; the mandatory features should always be present (do not need to be weighted).In the following paragraphs, we summarize the basic terminology that we use in this paper related to feature models.
A configuration is a pair [sel, sel] where sel and sel are the sets of selected and unselected features, respectively.In this paper, a configuration will be a traffic scenario.Regarding priorities, each feature  has a weight   in the range [0, 1].Then, the weight   is 1−  .A prioritized pair pc is composed of two features  1 and  2 and a weight such that the pair weight is calculated as the product of the weight of  1 and  2 :   =  1 *  2 .Consequently, the configuration priority is calculated as the sum of the prioritized feature pairs which are present in the configuration.It should be noted that a configuration is valid in FM iff it does not contradict any implicit or explicit constraints introduced by the FM.
Combinatorial interaction testing (CIT) is a constructive approach that builds test suites in order to systematically test all configurations of a system [16].In this paper we are going to use the pairwise coverage criterion, which is the most popular method in CIT and is based on the assumption that most errors originating in a parameter are caused by the interaction of two values [47].This criterion is satisfied if all feature pairs ([( 1 ,  2 ), ( 1 ,  2 ), ( 1 ,  2 ), and ( 1 ,  2 )]) are present in at least one configuration of the test suite.When this technique is applied to FMs, the idea is to select a set of valid configurations, where the errors could be present with higher probability.In addition, the prioritization of the features makes it possible to first test the more frequent or more important features.

Validation Strategy
At this point, once we have explained some preliminary required concepts, we describe the main steps involved in our validation strategy for the sake of a better understanding of this study.Figure 3 illustrates the general scheme of this procedure, which consists of four main phases: cycle programs optimization, Feature Model Design, Scenarios Generation, and Cycle Programs Validation.A brief explanation of how the four phases of our validation strategy are carried out is given in the following: (1) Optimization.As described in Section 3, given an urban scenario related to a certain area in a real city, the cycle programs of traffic lights operating in this area are optimized by means of several specialized algorithms: PSOTL, DETL, RS, and SCPG, described in Section 3.3.As most of these algorithms are based on stochastic procedures, a number of different candidate solutions (representing traffic light programs) appear that must be thoroughly analyzed, with the aim of selecting the most accurate one for a wide range of traffic conditions.Note that we have used a neutral scenario for the optimization of the solutions (traffic light programs).In a later step in phase 4 (Validation), we validate these solutions with multiple scenarios with different characteristics generated in phase 3 (Generation).
(2) Modeling.This phase entails the feature model (Section 3.4) design for our traffic scenarios.At this stage, the human expert has to select the features and constraints and decide how the priorities are assigned to the features.More details about the traffic modeling are given in Section 5.
(3) Generation.The generation of scenarios is automatically carried out by means of our Prioritized Genetic algorithm for feature models (detailed in Section 6).
In this way, we can obtain a test suite of scenarios that fulfill the pairwise coverage criterion.In addition, the generated scenarios are ordered by priority according to the combination of features that are involved in the specific urban area.Note that the scenarios generated are only used for the validation and not for creating better schedules by the optimizers.
(4) Validation.Finally, the optimized cycle programs are then validated with regard to the generated scenarios following our traffic FM.The experimental procedure leading up to accomplishing this phase is described in the experimental section (Section 8), and the resulting valid cycle programs are analyzed according to the stakeholder interests.
In addition, there exist some dependencies between these phases that have to be considered; for example, the traffic scenarios cannot be generated unless the traffic feature model has first been defined, whereas the cycle programs optimization phase could be done in parallel to the definition of the generation of the feature model and scenarios.The optimization phase is independent from the definition of the traffic FM because we optimise the cycle programs on a neutral scenario, which is not affected by any feature defined by the traffic FM.Finally, the cycle program validation depends on the generated traffic light programs and the traffic scenarios.
After applying these steps, we are now able to numerically evaluate a set of cycle programs of traffic lights, in order to objectively select the overall best taking into account several scenarios.With this goal, we use some traffic measures in order to evaluate a cycle program of traffic lights according to the desirable behavior of the whole traffic system.In other words, we select the best cycle program depending on the stakeholder interest such as saving fuel, reducing emissions, or avoiding traffic jams.

Modeling: Traffic Representation with Feature Models
Traffic management is a hard task that involves lots of varying features.Traffic is a highly variable system that could provoke a wide range of possible scenarios.Therefore, it could be represented by a feature model, which could be used for modeling the common and variable features of a system and their relationships.In addition, not every traffic scenario has the same importance or probability of occurrence.So, we have extended the FM to prioritize features in order to first test the most important scenarios.Figure 4 shows the generated FM for representing the traffic management system.Among the main features that could describe a traffic scenario, we have taken into account several conditions affecting the traffic such as the weather, if there exists an emergency situation, and the different vehicle's characteristics.Among these vehicle features, we have the number of vehicles, type of vehicle, driver imperfection, and driver reaction time.In the FM, the features are defined as fuzzy; however, in Section 7 we set concrete values corresponding to the traffic characteristics available in the SUMO simulator.For some of these features we select the weights according to some real-world data that we describe in the following.When no data is available to justify a weight, we simply assign equal weights to all the possible values.In what follows we are going to justify the chosen features: (i) Weather.The weather types we have considered are Rainy, Stormy, Sunny, Windy, and Foggy.They are related with a inclusive-or, so at least one feature should be selected.In order to assign the weights, we have consulted the data of the Spanish National Institute of Meteorology, concretely the data of 2012 from the Malaga airport weather station.The weight assigned is the percentage of days of the year that the associated condition held.
(ii) Emergency.We have considered whether an emergency situation has occurred or not because it could be important to know how traffic flow evolves in an emergency situation.Emergency values considered are Yes and No.When there is an emergency situation in the whole network, the reaction time is shorter and the traffic light programs have less time to help the driver's arrive at their destination.They are related to an exclusive-or, so only one feature can be selected.They are equally weighted.
(iii) Vehicle.We represent the main vehicular characteristics under this mandatory feature.For some child nodes of the Vehicle feature it was not possible to obtain reliable public data, so in those cases we considered equal weight for the features.All child nodes of the Vehicle feature are also mandatory and are as follows: (1) Vehicles Amount.The number of vehicles considered is Few or Many.They are related to an exclusive-or, so only one feature can be selected.They are equally weighted.(2) Driver Imperfection.This feature represents how bad a driver is (low, medium, or high); for example, a high driver imperfection rate is a bad driver.In many traffic situations, the driver makes the difference.So, we have considered three levels of expertise.They are related to an exclusive-or, so only one feature can be selected.They are equally weighted.(3) Driver Reaction Time.Time spent by the driver to carry out an action in the vehicle.We have considered three levels of reaction time (Slow, Medium, and Fast), in which the feature Fast has a larger priority than Medium.Medium has larger priority than Slow.They are related to an exclusive-or, so only one feature can be selected.(4) Vehicle Type.We have taken into account two types of vehicles: Light and Heavy.In order to assign the weights, we have consulted public data of the traffic control center (city hall) of Malaga.Most of the vehicles are light, so we have assigned a weight of 90% to them.They are related to an exclusive-or, so only one feature can be selected.
Besides the parent-child relationships, features can also be related across different branches of the feature model with the so-called Cross-Tree Constraints (CTC).In this model, we have generated three CTCs that are shown in the upper right corner of Figure 4 and are as follows: (i)  → , (ii)  → ..ℎ∧ .,(iii)  → ..∧ ..
Let us explain our interpretation of the CTCs included in our traffic feature model.It is impossible for the weather to be sunny and rainy at the same time.We think this first constraint does not need any further explanation.On the one hand, when the weather is stormy, we consider that the driver's imperfection must be high and the driver's reaction must be fast because the driver is paying much more attention when the weather is bad.On the other hand, when the weather is sunny, we consider that the driver's imperfection must be low and the driver's reaction must be slow because the driver is much more relaxed.
Finally, this model represents 960 valid scenarios with different levels of importance and possibility of occurrence.The ideal situation is to generate an optimized traffic light program for each scenario, but this could be costly.Considering our previous results in program optimization for synchronizing traffic lights [9], the generation of 960 different cycle programs could take around 19 years of computation time.Moreover, here we have applied four different algorithms, so the computation time is multiplied by four, resulting in a total of 76 years of computation time.For this reason, the use of automatic intelligent algorithms to reduce the testing scenarios is mandatory for this task.Therefore, in the next section we explain how we reduce the number of testing scenarios without loosing the capacity of measuring the quality of the cycle programs.

Generation: PGFM Algorithm for Scenarios Generation
The Prioritized Genetic Feature Model (PGFM) algorithm is an evolutionary approach that constructs an entire test suite taking into account the feature model, the constraints between the features, and their priorities in the generation of the test suite of traffic scenarios.It is a constructive algorithm that adds one new valid scenario to the partial solution in each iteration until all pairwise combinations of features are covered.In each iteration, the algorithm tries to find the valid scenario that adds more coverage to the partial solution.Algorithm 3 sketches the pseudocode of PGFM.As input, it receives a feature model for generating the test suite.Initially, the test suite is initialized with an empty list (Line (1)) and the set of remaining pairs (RP) is initialized with all the valid weighted pairwise combinations of features (Line (2)).In each iteration of the external loop (Lines (3)-( 21)), the algorithm creates a random initial population of individuals (Line ( 5)) and enters an inner loop which applies the traditional steps of a generational evolutionary algorithm (Lines ( 6)-( 18)).That is, some individuals (traffic scenarios in our case) are selected from the population (), recombined, mutated, evaluated, and finally inserted in the offspring population .An individual only contains the selected features, so the operators only affect these features.The nonselected features are those which are not contained in the individual.If a generated offspring individual is not a valid scenario (i.e., it violates a constraint derived from the feature model), it is transformed into a valid scenario by applying a Fix operation (Line ( 12)) provided by the FAMA tool [48].
The fitness value of an offspring individual (Line ( 13)) is the sum of the weights of the weighted pairwise combinations that would still to be covered after adding the offspring solution to the test suite.This is a minimization fitness function that promotes the generation of scenarios that cover the pairs of features with higher weights.Note that, as the search procedure advances, the cost of computing the fitness function decreases, since each time fewer weighted pairwise combinations remain uncovered.
We set the configuration parameters of PGFM with values frequently used for genetic algorithms: one-point crossover strategy with a probability of 0.8, selection strategy binary tournament, population size of 10 individuals, mutation that iterates over all selected features of an individual and replaces a feature by another randomly chosen feature with a probability of 0.1, and termination condition of 1,000 fitness evaluations and full weight coverage in the external loop.
As a result of the execution of PGFM, Table 1 shows a list of 10 prioritized scenarios for testing.This list of scenarios (  with  ∈ [1,10]) fulfills the pairwise coverage criterion, which ensures that all distinct traffic conditions are considered in our validation model.In this table, feature abbreviations in the first row are defined in Figure 4, corresponding to labels in the FM tree.A feature is included in the scenario if it is marked in the scenario's row.The last column indicates the total weight of the scenario according to the priority of the features included.In fact, we have to highlight the reduction achieved by considering only 10 scenarios from initial 960 valid scenarios, although all the possible scenarios that need to be explored are 2 25 .This is a successful reduction, especially if we consider the time spent for an exhaustive experimentation (57 years on one single CPU) compared to the actual 8.3 days spent.
In this way, shown in Table 1, the use of priorities led us to only execute the six most important scenarios, since they guarantee 99% coverage.This indicates that several scenarios should be considered, but some of them are more important than others and so should be tested first; that is,  1 scenario adds 63.59% coverage, then  1 covers 63.59% of the total weight of feature pairs.The optional features that are selected in this scenario ( 1 ) are as follows: it is sunny, the simulation is long, there are a lot of vehicles, the driver skills are medium, the driver reaction is fast, and there is a high percentage of light vehicles.If we test under the conditions defined by  1 we know that the results provided of our optimization algorithm for finding the traffic light cycles will cover 63.59% of city total conditions which is a lot.Doing so, we add completeness to our study and weight to its robustness in a holistic scientific analysis of the whole city.

Generation of Traffic Light Programs: Malaga City Case Study
As we are interested in developing an optimization solver capable of dealing with close-to-reality and generic urban areas, we have generated an instance by extracting actual information from real digital maps.From this instance, considering the scenarios computed with PGFM algorithm extracted from our FM (different traffic density, weather, etc.), the optimized cycle programs were generated.The selected urban area covers approximately 750,000 m 2 and is physically located in the city of Málaga in Spain.The information used is all real and concerns traffic rules, traffic element locations, buildings, road directions, streets, intersections, and so forth.Moreover, we have set the number of vehicles circulating, as well as their speeds, by following current specifications available from the Mobility Delegation of the Málaga's City Council.This information was collected from sensorized points in certain streets obtaining a measure of traffic density at different time intervals.In Figure 5, the selected area of Malaga city is shown with its corresponding captured views of OpenStreetMap and SUMO (as explained in Section 3.1).In the zone between the city center and the harbor, this instance comprises streets with different widths and lengths and several roundabouts.The main streets found in this area are Alameda Principal, Andalucía, Manuel Agustín Heredia, Colón, and Aurora.The area contains 70 intersections with 4 to 16 traffic lights at each one, adding up to a total number of 312 traffic lights.There are between 250 and 500 vehicles circulating during the analysis period; each one of the vehicles completes its own route from origin to destination circulating with a maximum speed of 50 km/h (typical in urban areas).The traffic flow is generated by means of the DUARouter tool [42] that computes vehicle routes used by SUMO using shortest path computation and dynamic user assignment (DUA) algorithms.
With regard to the driver's features, there are two main aspects to characterize the driving style: imperfection and reaction, represented in SUMO by means of parameters  and , respectively.The first one is a real number in the range [0, 1] for weighting the driver's skills.A high value of  means that the driver commits many driving errors.In contrast, a low  means good driving.The second parameter () measures the driver's reaction time in seconds.In addition, we consider other parameters such as the simulation time, the number of vehicles, and the percentage of light and heavy vehicles.A heavy vehicle has lower acceleration, deceleration, and maximum velocity than a light vehicle.The optional features selected from our traffic model are the source of variability leading to different traffic scenarios.
Table 2 summarizes the parameter settings in terms of driving and simulation values with respect to each selected feature.For example, when selecting the feature Ra (rainy), parameters  and  are increased by 0.1 and 1, respectively.Regarding the emergency feature, if Yes is selected, the time to reach the destination is 500 seconds, but when No is selected this time is 1000 seconds.Note that the simulation time is set according to the emergency feature.So, we have scenarios with different analysis times.Finally, when feature Li is selected, there are 95% of light vehicles and 5% of heavy vehicles.In contrast, when Hv is selected, there are 85% of light vehicles and 15% of heavy vehicles.
The trace information obtained after the simulation of each traffic light program, for a given scenario, is used to compute a set of metrics: vehicles arriving at their destination (VLL), vehicles not arriving at their destination (VNLL), CO 2 emissions (CO 2 ), NO  emissions (NO  ), fuel consumption (FUEL), and global journey time (GJT).In this way, it is possible to numerically quantify how accurate a given cycle program is and compare it with all other existing solutions, from human expert's programs or from automatic optimizers.

Experiments
This section describes a series of experiments and analyses of the results obtained so as to empirically test our approach.This allows us to put here into practice all initial specifications of our strategy, explained in Section 4, in order to validate the resulting traffic light programs in the ten scenarios generated by means of the execution of the PGFM algorithm.

Experimental Setting.
As stated in the introduction, we include a varied analysis including four specialized algorithms for computing cycle programs of traffic lights: PSOTL, DETL, RS, and SCPG.The first three optimization algorithms are nondeterministic, so we have carried out 30 runs and consequently have obtained 30 different (best) cycle programs for each algorithm.The fourth algorithm (SCPG) is a deterministic technique, so it only computes one optimal cycle program.As our aim is to validate these cycle programs in different scenarios, in the case of nondeterministic algorithms we have carried out 30 independent runs of the simulator per generated scenario (10), algorithm (3), and proposed best cycle program (30), adding up to a total number of 27,000 executions.This accounts for the considerable effort we have made to study the city in really different conditions of the traffic flow to attain realistic results.In the case of SCPG, we have performed 300 executions (10 scenarios × 30 independent runs).
In order to check whether the differences between the algorithms are statistically significant or just a matter of stochastic noise, we have applied the nonparametric Wilcoxon rank-sum test [49].The confidence level was set to 95% ( value below 0.05).In addition, so as to properly interpret the results of statistical tests, it is always advisable to report effect size measures.For that purpose, we have also used the nonparametric effect size measure Â12 statistic proposed by Vargha and Delaney [50].Effect size provides information about the magnitude of an effect, which can be useful in determining whether it is of practical significance or not.
All the executions were run in a cluster of 16 machines with Intel Core2 Quad processors Q9400 (4 cores per processor) at 2.66 GHz and 4 GB memory running Ubuntu 12.04.1 LTS, managed by the HT Condor 7.8.4 cluster manager.In order to offer an approximate idea of the required computational effort to solve this realistic task, we have to note that, in terms of a single core machine, our complete experimentation with 27,300 independent runs would require about 8 months of computation.In contrast, in the used parallel platform, the computation of the valid scenarios required 8.3 days (26.26 seconds per run).

Experimental Results.
In order to illustrate the many results obtained at a glance, Figure 6 shows the performance of algorithms for the selected metrics (VLL, CO 2 , NO  , Fuel, and GJT) for each vehicle and all scenarios.These metrics are normalized for a proper visualization.In this figure, a first observation is that parameters CO 2 , NO  , and Fuel are correlated in almost all algorithms, so, we could consider to deal with them as one single factor.Nevertheless, as one of our main focus is environmental resources consumption/emission saving, we decided to use these parameters as nonaggregate values.The fact of getting correlated results is due to the simulation strategy that indeed follows a realistic model (HBEFA).
A second observation in Figure 6 is that PSOTL obtains the maximum number of vehicles arriving at its destination with the lowest fuel consumption, and also with the lowest emissions of CO 2 and NO  .However, for this algorithm, the global journey time (GJT) is the highest one.The explanation of this counter-intuitive result is that a vehicle, which does not arrive at its destination in the analysis period, is not used for the computation of the metrics.Thus, several vehicles do not arrive at their remote destinations, so, the global journey time will not be increased by the stats of the vehicles with remote destinations.In Figure 6, we also note that SCPG is the worst algorithm in terms of fuel consumption and CO 2 and NO  emissions, even though it had a low number of vehicles that arrive at their destination.This means that the problem is highly difficult and lacks clear patterns for experts when we go to a large scale optimisation scenario.
In fact, as increasing the number of vehicles that arrive at their destination during the study timeframe (VLL) is an important metric for evaluating how the system prevents traffic jams, we have organized, in Table 3, the average VLL for each scenario and algorithm.We have also highlighted the particular cycle program that obtains the best average of VLL in the 30 independent executions.Therefore, we can easily see that the cycle programs generated by PSOTL are always the best at preventing traffic jams, since they guarantee that almost all vehicles reach their destination within the analysis time.However, the cycle program that obtains the best result is not always the same in all scenarios.For example, for PSOTL, programs 3 and 20 (see the numbers in parenthesis in Table 3) obtain the best results in 3 out of 10 scenarios.This means that there is no single cycle program that shows the best results in all possible traffic scenarios, as expected.These results lead us to consider, in Section 9, the priority assigned to each scenario, in order to choose the best cycle program In order to check whether the differences between the algorithms are statistically significant or not, we have applied the nonparametric Wilcoxon rank-sum test.In Table 4 we show the number of scenarios where significant differences exist between the algorithms.In this regard, we can see that PSOTL is significantly different to RS for all the scenarios (thus, an intelligent algorithm is needed), whereas solutions provided by DETL are not statistically different to the ones provided by the human experts, represented in SCPG (so DETL is just equivalent to the expert's solutions, not better).RS is significantly better than SCPG and DETL by 3 and 6 times, respectively.Table 4 also reports the effect size measure Â12 for the VLL metric for all scenarios.We interpret the Â12 statistic as follows: given a performance measure , Â12 measures the probability that running algorithm  yields higher  values than running another algorithm .In this regard,  represents algorithms in the columns and  represents algorithms in the rows.If these two algorithms are equivalent, then Â12 = 0.5.A value of Â12 = 0.3 entails that one would obtain higher values for  with algorithm , 30% of the time.As shown in this table, PSOTL's solutions are the best for all scenarios and their differences are of actual importance in practical cases of the city.Numerically speaking, these cycle programs (the ones generated by PSOTL) are better than the ones provided by DETL, RS, and SCPG by 79.01%, 71.29%, and 83.03%, respectively.In fact, these results are labeled as large effect size according to Cohen's  scale [51].These results provide us with some insights into the successful application of PSOTL's cycle programs to real traffic light schedules.

Focusing on Scenarios.
From the point of view of the different generated scenarios, in Figure 7 we can observe the box plots of resulting traces in terms of metrics for all algorithms.Box plots display variation in samples of a statistical population without making any assumptions of the underlying statistical distribution.Box plots show the mean, the interquartile range (in color), and the maximum and minimum value on the extremes.In the box plots of Figure 7, there are no outliers, which are observation points that are distant from other observations.Four metrics are used, two of them measured per vehicle (CO 2 and GJT).Specifically, the second subfigure shows the percentage of time that the journey takes compared to the total analysis time; for example, 40% value means that the average vehicle takes 40% of the analysis time to reach to its destination.The other two measures used are VLL and VNLL.Note that we do not show the resulting box plots for VNLL because they are the complementary results obtained for VLL.These diagrams focus on the scenarios in order to show the variability of traffic measures, while at the same time we have tried to show the results without the specific bias introduced by a particular solving technique.
An interesting observation in Figure 7 lies in the fact that Scenario 6 ( 6 ) is the most favorable for the GJT and VLL indicators, whereas for the CO 2 indicator the third best result is obtained.The  6 scenario induces a successful performance of cycle programs.In fact, the main features of  6 are ideal for enhancing the traffic flow: sunny weather (Su), no emergency (No), few vehicles (Af), and more drivers with a high level of expertise (Il).In contrast, the most unfavorable scenario is  5 , since only a few vehicles arrive at their destination, and the CO 2 thrown up into the atmosphere is the highest for each vehicle in our study.This last scenario has unfavorable weather conditions: rainy (Ra), stormy (St), and foggy (Fg).In addition, there is an emergency (Ys), the number of vehicles is higher (Am), and there are more heavy vehicles (Hv).All these features induce adverse conditions in this scenario ( 5 ), which negatively influence the traffic flow.
In light of these results, we claim that providing experts with better general programs is possible by using our approach.We consider different traffic conditions in a set of modeled scenarios which provides the experts with unbiased traffic light programs.In contrast, the aforementioned literature (see Section 2) just offers ideal traffic conditions.In addition, here we go one step beyond by weighting those features with more probability of occurrence, but without missing those traffic situations that are more extreme.

Prioritized Analysis
In this section, as far as we know, we are the first to apply prioritization techniques that consider all possible traffic scenarios at a time.We analyze how each generated cycle program works for all scenarios as a whole.Since not all scenarios are equally important or frequent, the computed metrics are weighted according to the scenario's priority.tackling vehicular urban environments.In this regard, considering all modeled scenarios as a whole helps us to give a robustness to our solutions, so we have therefore weighted these scenarios according to their priority.
In Table 5, we report our top ten cycle programs for the following metrics: VLL, FUEL, CO 2 , and GJT.We have to highlight that the PSOTL13 cycle program is the best solution in terms of VLL, for the most prioritized scenario ( 1 in Table 3).However, when we consider the scenarios altogether PSOTL13 is not the best for VLL, so it appears in second place for this metric in this ranking.This fact was unexpected because of the high weight of  1 ; nevertheless the best one for all scenarios as a whole (PSOTL20) is the best in three different scenarios ( 4 ,  7 , and  10 in Table 3).
In Table 5, we can observe the cycle program that is the best for each metric in all scenarios together.So, we only have to decide which metric is more important for us and then set a particular configuration.It is possible that different stakeholders could be interested in setting up different cycle programs.For example, the municipal authorities of the city may be interested in avoiding traffic jams, so the GJT metric should be optimized.However, the national government could impose a CO 2 emissions maximum rate, so the CO 2 metric also ought to be minimized; thus the best option would be to configure the traffic lights with the Program PSOTL20.9.2.Practical Benefits.Our validation strategy provides experts with many practical benefits.In general, we have alleviated the work of the experts, with the cost reduction this implies, just by setting the priorities in the feature model.From the feature model, they obtain several validation scenarios that accomplish the pairwise coverage criterion and provide us with some confidence for choosing the best traffic lights program.In addition, the priorities of scenarios allow us to select a good solution for most traffic conditions taking into account the weight of the scenarios, previously computed by means of the PGFM algorithm.
Another practical benefit is the flexibility of the proposed approach.Since it is impossible to objectively decide which cycle program of traffic lights is the best, we provide several solutions according to the desired behavior of the system.As we have previously said, it is possible that different stakeholders could be interested in setting up a different cycle experimentation aimed at testing our strategy on a large number of different cycle programs, generated by automatic optimizers, as well as by human expert's procedures.The main conclusions that we can draw are as follows: (i) We propose the use of a traffic feature model with priorities, which allows us not only to reduce the number of traffic scenarios to test the available cycle programs but also to generate the most important scenarios.This can be carried out by means of the PGFM algorithm, proposed in Section 6.This algorithm is able to automatically generate a minimal prioritized test suite from a given feature model.Experts can now work with a reduced set of scenarios with similar fault detection capacity, which means a great cost reduction.
(ii) Our validation strategy allows the experts to numerically quantify the quality of a traffic light cycle program on different scenarios.This quantification is performed on different scoring metrics, like vehicles that arrive at their destination, CO 2 emissions, NO  emissions, global journey time, and so forth.We could choose a specific cycle program depending on the traffic conditions and then the metric we want to optimize.In this way, we offer a wide range of solutions according to the stakeholder interests.
(iii) We have experimentally shown that there is no single specific cycle program which is the best for all scenarios with numerical models and results (not just with intuitive beliefs).Nevertheless, for the sake of an easy decision-making process, we also provide a method to rank the cycle programs.This is possible because we have taken into account the scenario's priority automatically computed from our feature model.
(iv) With regard to our case study, we have analyzed how the cycle programs generated by four algorithms (PSOTL, DETL, RS, and SCPG) adapt to different traffic conditions (ten different scenarios) in Malaga city.We have compared the generated cycle programs with the solution provided by the experts (SCPG).Considering the prioritization of scenarios, the improvement achieved in solution quality is remarkable, especially for CO 2 emissions, in which we have obtained a reduction of 126.99% compared with the experts' solutions.
As a matter of future work, we plan to consider several scenarios in a single fitness evaluation of solutions in optimization algorithms.Then, we expect to enhance the search procedure of the algorithms in cycle programs for the most influential traffic conditions.Moreover, we plan to deal with a multiobjective model of the problem according to stakeholders interests.As a result, we will provide a Pareto front, the objectives of which are the minimization of the CO 2 emissions, the minimization of the global journey time, or the minimization of traffic jams.

Figure 1 :
Figure 1: Cycle program (phase duration) of traffic lights within intersections.Dashed circles indicate where traffic lights are located in the real intersection.

Figure 3 :
Figure 3: Modeling, optimization, generation, and validation phases of our proposed strategy.

Figure 4 :
Figure 4: Traffic management feature model with priorities.

Figure 5 :
Figure 5: Malaga scenario instance.Selection from OpenStreetMaps and exportation to SUMO format.

Figure 6 :
Figure 6: Normalized star diagram containing traffic accuracy metrics for all algorithms compared.

9. 1 .
Analysis of Best Performing Cycle Programs.Designing robust traffic light programs is a mandatory task when

Figure 7 :
Figure 7: Boxplots of the resulting distribution traces of our studied metrics: CO 2 , GJT, and VLL.
Input: A scenario instance () of a given city  Output: Best found solution  encoding a cycle program (1)  ← initializeSwarm() (2) while  < MAXIMUM  do (3) for each particle    in  do

Table 1 :
Computed scenarios by PGFM with pairwise coverage criterion.Feature abbreviations are defined in

Table 3 :
Best average VLL for each algorithm and scenario.The specific cycle program (out of 30) that generates the best result for this algorithm and scenario is indicated in parenthesis.

Table 4 :
Number of scenarios where significant differences exist for VLL between the algorithms (#) and Vargha and Delaney's statistical test results ( Â12 ).