Troubleshooting Assistance Services in Community Wireless Networks

We have identiﬁed new services intended for users and administrators of community wireless networks. Troubleshooting assistance services will assist the users during solution of communication problems, gathering data for expert analysis, informing the user about the state of the network (including outages), and so forth. Network administrators will be provided with a unique tool supporting the network analysis, operation, and development. We have mainly focused on the use cases and prerequirements— the problem of topology discovery.


Introduction
Community Wireless Networks (CWNs), the phenomenon of the last decade, differ in many ways from the usual enterprise computer networks or access networks of Internet service providers (ISPs).We have to address common problems appearing in these networks with regard to their specifics.
Our experiments were done in local community wireless network-http://www.hkfree.org/-operating on the territory of Hradec Kralove town and the surrounding conurbations.

Community Wireless Network Characteristics
We deal with the specifics of CWNs networks in the Czech Republic in this paper, but many conclusions can be generalized.We will briefly discuss individual differences.History of CWNs in the Czech Republic starts in 2002 when in Prague and other places the first small neighbour networks appear.Their advantages are economies of scale, removal of margin of a commercial ISP, and also the possibility to influence the functioning of the network itself and to contribute to it (e.g., to implement own services related to the network).The development of these networks was stimulated by far too high price of fast and unlimited internet connection (broadband) and by decreasing price of electronics, especially components for building wireless radio networks (WiFi cards) [1].
Later, they evolved from unorganized neighborhood networks to large organized communities forming a civic association or other legal forms.
Other terms such as Mobile Ad Hoc Network (MANET) and Wireless Mesh Network (WMN) are related to the CWNs; Mahmud et al. [2] explains these types of network in detail.CWNs, at least in the form in which they occur in the Czech Republic, do not fully comply with the definition of either MANETs or WMNs.Its technical characteristics are closer to the ISP access networks, with the different last mile and backbone technologies.For example, unlike ADSL, they use low-cost WiFi outdoor links.Spontaneous changes in topology and outages are more frequent.
Worldwide CWNs are typically rather simple (often homogeneous) WMNs or communities sharing their own connectivity for mobile users (FON and others).Furthermore, they solve the problem of Internet infrastructure reconstruction after natural and other disasters.Many networks decline in the context of the increasing penetration of free WiFi in cities (cafes, hotels, etc.).In contrast, the community networks in Czech Republic are expanding.They are making the technical support professional as well.Albeit from an organizational point of view, they remain primarily nonprofit organizations involving users in the network development.These are the key aspects for the proposed troubleshooting assistance services.
CWNs are a phenomenon not only in the Czech Republic [3,4] but here it has an enormous interest of the public.For a general idea about their size we can use statistics of NFX association which covers most community networks in the Czech Republic.It registers altogether 41,000 users (households) in its member networks.For a comparison the Czech Statistical Office presents 4,150,000 households [5].By a rough estimate, there is 1% of Czech households in the community networks.
CWNs usually provide fixed as well as mobile Internet connectivity service using WiFi access technology.Some communities support "roaming" between particular access points within their network or with other networks.

Troubleshooting Services
Due to an effort for effective financing of development, improvement of safety and quality of services, a natural need for central supervision, and partial planning of topology appeared.But due to the previous spontaneous development of these networks, it is difficult to satisfy this need because the whole topology, utilisation of individual links, their types, and other data are often unknown.The aim is to do the topology discovery in these spontaneously built wireless networks regarding their specific environment and to suggest a concrete applicable approach.The topology information is necessary for later troubleshooting assistance.
During the development of community networks, the need to obtain a clear and reliable picture of the entire network topology appeared.With this information it is possible to further develop the network efficiently, solve problems in network, or clarify dependencies between individual nodes according to the real topology.By analyzing the dependencies, we can properly inform users about planned network outages, since it is known on which nodes and connections are the end node (user) functionally dependent.In addition, the knowledge of the topology may be used for supporting the analysis of unplanned outages by network users, as outlined below.
From the perspective of network analysis community wireless networks have their positives and negatives.Active involvement of users in network operations is the positive aspect of community wireless networks.Users may participate in topology discovery, network monitoring, and other tasks.On the contrary, the negative is considerable heterogeneity of networks and the use of low-cost network elements, which often lack the tools for monitoring and management that are commonly available in the business sector.
Especially in case of ad-hoc networks or other spontaneously created networks, the possible supportive communication and information system could be a benefit.Its aims are the following: assistance to a user in solving communication problems, preparation records for its possible analysis by an expert, and improvement of user's knowing about the state of the network (planned outages).It will also assist the network administrators in solving problems and as a tool for automatic documentation and for designing future changes.
3.1.Use Cases.In CWNs, there are two types of users who benefit from the supportive information system: ordinary users (community member) and network administrators.
Here are the typical use cases (scenarios) for both roles.Ordinary user (community member) needs to be informed about planed outages.Not only in advance, but the solution should work in case he or she will connect to the network during the outage as well.The system will support the exchange of outage information between peers.For example, already (in advance) informed peer will later inform the peer that connected during the outage.
Ordinary user should be able to solve basic problems with his or her connectivity.The system assists the user.It should be able to localize the causing failure ale, suggest the solution, including contacting appropriate responsible person, and provide necessary observation data.
Network administrators have to efficiently develop the network and solve common operation problems.The troubleshooting services provide them with network topology information, including the historical data and continuous changes.They help with identifying and solving common topology and configuration issues and with analysis of outages.Network administrator will announce planned outages to affected users inserting this information using the troubleshooting services.
The services are not purely focused on the central administration, but will also allow for network analysis by the user using the software running on his or her computer.Therefore, the user will indirectly help administrators to solve network-related problems providing the information of an expert nature.
The services will also efficiently inform users of planned outages.In case a peer could not have been informed about the outage because it had been out of reach of network or switched off then the outage was planned.After its joining to the network, it is not able to communicate with the central node (server) which had originally propagated the information.Using the troubleshooting services peer will find active and available neighboring peers and ask them for information about the failure.The neighboring peer remembers (cache) this information from the time it was announced.The result is the user is correctly informed about the planned dropout, including its duration.
The service will automatically identify the situation that there was an unscheduled outage of the primary link that the user uses to access the Internet and will informing the user that the backup path is used and may have a worse connection parameters.deployed on all types of peers-supportive server, administrator's PC, and user's devices (at least PCs).We discuss the communication architecture later.Particular service will collect available information about the network in any node (including end user nodes) and analyze this information with the purpose of gaining information about topology of the network, its utilization, bad or improper configuration of some nodes, safety risks and attacks.Analysis results will be provided to others nodes for global analysis.

Architecture. According to
Such system can be characterized briefly as distributed database system or peer-to-peer database.This database provides information for local analysis (i.e., problems with the node and its environment and root causes of these problems).Database also enables global analysis leading to gaining information about the state of the network and solution of its main problems such as optimization of the network, routing, and so forth.
The fact that the network is communication medium and the observed subject at the same time introduces new challenges.The communication architecture described later respects this peculiarity.

Topology Discovery Modules
Model of the network topology is essential for later troubleshooting assistance.We will briefly describe particular modules for topology discovery on layer 3 (L3) and layer 2 (L2) of ISO/OSI model and the method of link-type classification.These methods were specially designed for community wireless networks.Common solutions of topology discovery in enterprise networks are not applicable here as they highly rely on the simple network management protocol (SNMP).Unfortunately, CWNs are built using lowcost consumer-market network components that lack SNMP support.

L3 Topology. Community networks widely use OSPF
dynamic routing protocol that is useful for our application.
Routers and links are often connected to such redundant structures to be able to use another path in case of failure.In such event it's necessary to immediately update the routing tables' entries.Exchange of information among the routers and consequent updates of tables is specified by the given routing protocol.
The fact that each dynamic router in the network contains the network graph in its data structures can be used for obtaining the current state of the network and to visualize the overall topological map.
Dijkstra's algorithm is used for construction of the shortest paths' tree, based on edged labels.These labels in a database of network topology use a special metric called the cost.This metric is set for each link separately and gives preference to the direction of the link.The lower the cost is, the more the link is preferred.Router administrator has an opportunity to influence preferences of individual links by setting their costs.The most important is usually the path towards an Internet gateway, so he or she chooses costs of links in the way to prefer a qualitatively better link and to use other links as a backup in case of a dropout of a primary link and as a part of path to other locations in the network (for intranetwork communication).Load-balancing using equal cost multipath via more links is not commonly used but is also possible.Every point-to-point link is from the OSPF view composed of two oppositely oriented edges and the cost of every edge can be chosen.In case of multipoint links, the cost from a given node via a given link (regardless of the target node in this link) is common.
OSPF protocol is based on so-called link-state algorithm designed to distribute changes in connections between routers.Each router in the network will form a model called OSPF network-a topological map of the whole network area-based on this information.This map can be represented by a directed graph with edge labels based on costs assigned to each link in both directions.Then (and after each topological change) router calculates the shortest paths' tree in each of the nodes using Dijkstra's algorithm applied to the graph.Entries in routing table are created, modified, or deleted according to these paths.
Using OSPF protocol has the following advantages [6].
(i) Routers know the topology of the whole network area.
(ii) Fast convergence-routers spread topology changes data immediately, and then use the information for the calculation.Some of the other protocols are designed to perform the calculation first, then to spread the information further.Convergence is, of course, adversely affected by frequent changes of links' states (flaps).
(iii) Event-driven distribution of information about the states of links-no need for periodic updates at short intervals; information is spread when there is a change.
Currently, our subject network's (http://www.hkfree.org/)topology consists of 161 routers and approximately 3440connected workstations (using the Internet connectivity service) [7].OSPF routing protocol suitable for such a large network is used for internal routing.Most routers of this network are common computers running GNU/Linux operating system.Routing daemon Zebra Quagga (opensource routing software that is a fork of GNU Zebra) is deployed in order to be an implementation of the OSPF protocol.
The routing software is managed through the command line administration console.It provides the ability to retrieve data from a database of network topology of one of the routers, which are input data for the topology discovery module.
The structure of such a wide network of routers can be hardly overviewed by an administrator, that's not able to easily find the key nodes.Searching for a specific router among dozens can be quite a challenging task.It is, however, expected that the administrator knows the approximate geographic location of the router(s).That is why we require the feature to layout the network graph according to the real geographical positions, assuming that the positions of some network devices will be available.Some known anomalies may occur in the network configuration.Typically, it is the assignment of asymmetric costs in two directions of the same link.This situation may be the intention of administrator as well as a misconfiguration.Troubleshooting service should be able to warn the administrators about these anomalies and visually highlight them in an appropriate manner in the network map.
As the service provides the view of network state at a certain moment, administrator may lack the information about some router or link that was not active at that moment.For this purpose, the archive of OSPF database snapshots taken in regular intervals is created and made available via the HTTP protocol.In a history analysis mode it is suitable to detect hot spots where frequent (i.e.unwanted) topology changes occur.This analysis also allows analyzing the historical states of the network, so ex post we are able to identify the possible causes of failure such as temporary malfunction or misconfiguration of a router.
For network analysis, the data about costs of links can be further used for visualization of a primarily used path between two nodes, for example, the Internet gateway and a chosen node.
The implementation of the troubleshooting service displays (see Figure 2) the costs as labels of the edges and enables to display a tree of the shortest paths from a given node to all other nodes in the network and the shortest path between two nodes-in case of asymmetrically labeled edges (input and output costs are different) two different paths exist, regarding the direction of a data flow.The service also enables to highlight asymmetrically labeled links.This is an anomaly which does not have to be always deliberate and the highlighting can contribute to a revelation of a mistake that could stay unseen to the administrator and manifest itself sometime in the future due to other changes in the topology.The service should support an interactive design of the network in order to test configuration changes and troubleshoot the whole topology before real implementation.

L2
Topology.L2 topology discovery in the whole network's scale without the SNMP support is very problematic.Many standard protocols such as Cisco Discovery Protocol (CDP) or Link Layer Discovery Protocol (LLDP) are implemented in devices out of a consumer market where they support also the SNMP.All these protocols are not usable in a network based on consumer devices.
Link Layer Topology Discovery protocol (LLTD) is promising solution in CWNs.It is implemented in some consumer devices and especially in operating systems MS Windows from the Vista version and on.In the similar way, it would be possible to use the basic ARP protocol for getting an overview about the number of active devices in a given subnetwork.
The problem is the mentioned techniques work at the L2 level only-for transfer of the obtained information about a local L2 topology it is necessary to use a proxy which would transform the information from the given subnetwork into the form transportable into the central repository (topology server).
But in this consideration, the community networks offer a potential in participation of their members in the topology discovery process.In many places it can be more feasible to deploy monitoring software at the users' PCs than at the active network elements (or than replace these elements by the modern ones equipped by the SNMP).Service component deployed into the user's computer would mediate the exchange of information at least about the topology of a given L2 segment into which the computer is connected.By aggregation of this information, it is possible to discover at least a part of the network "peripheries" and to contribute more information to the overall view of the network.In this procedure, we can see certain parallels with the multiagent systems which represent a good theoretical basis for the following procedure and offer already existing frameworks for the following implementation.The communication architecture described later may be implemented using multiagent framework.
Motivating of the users for the installation of software for collecting topology information at their computer will be a specific issue.

Wireless Links Classification.
For the administration of the network and planning its development, the described topological map is not sufficient.For the support of decision, it is useful to find out a technological level of every used links, in an ideal case also their concrete parameters such as bandwidth.While in fixed enterprise networks, it is not a problem (e.g., via SNMP) to find out a type (and its corresponding technology) of the individual interfaces at particular router or switch; in wireless links this procedure does not always have to be possible.The link realized by a device connected via a common-fixed technology (e.g., 100BASE-TX) can be fully transparent at the L2 level but has a smaller bandwidth and other different parameters.
There has been just a primary research into pointto-point links classification based on a statistical analysis performed in this field.As an input data, we used measured characteristics of latency of each links-minimal, maximal, and average latency in milliseconds for each link.Let us note that the results of measurement on unsaturated link did not have a sufficient information value.That's why it was necessary to systematically saturate the link during the measurement.During our experiments, we found the parameters for MTR tool that saturated measured link in appropriate manner: mtr --report -c 10000 -i.000001 -s1472 $ip In http://www.hkfree.org/network, most links are based on wireless radio links in 5 GHz (half-duplex) and 10 GHz (fullduplex) bands, so the task of classification was reduced to resolution of links into these two groups.From descriptive statistics of the measured data resulted conclusions (e.g., that the measured characteristics not always have a normal distribution) determined the used methods of classification.
The methods of binary logistic regression and the algorithm (as a reference method) from the field of machine learning k-nearest neighbors were used for classification.
In both methods, minimal and average latency were identified as important explaining independent variables.Using the binary logistic regression method, we created the following classification model.
Classification function: L(X) = −3,063 + ,027 • LAT AVG + ,587 • LAT MIN , where LAT AVG and LAT MIN are average latency and minimal latency measured on particular link using the mtr tool with parameters specified earlier.
Threshold probability values is 0.5.That is, if then the link is classified as 5 GHz link according to our model.Otherwise, it's 10 GHz. Figure 3 describes the training and validation sets of knearest neighbors method.
Comparison of results of both methods was performed by 10-fold cross-validation using a training set of 71 links.Total rate of true classification was 91.5% at the logistic regression and 95.8% at the k-nearest neighbors.This shows that both methods are very well usable for the classification.
During the analysis of outliers, it was found out that both methods incorrectly classified the same links in the class 10 GHz into the class 5 GHz.In a more detailed examination of particular types of these links, it was found out that these were the oldest models used in the network which with their features approximate more to the less quality 5 GHz links (suffering from interference issues similar to 2.4 GHz band [8]).
Wei et al. [9] used a similar method of classification for identification of access networks types (Ethernet, ADSL, etc.) but with the use of a median and latency entropy as basic characteristics.We did not use them in our initial research because they are not produced by the standard diagnostic tools such as ping, traceroute, mtr, and so forth.

Communication Architecture
Special communication architecture M-client M-client server suitable for the network troubleshooting assistance support was proposed [10].
The proposed architecture includes and provides advantages of two architecture concepts: Classical client server architecture and Peer-to-Peer architecture.Client server architecture with central (sometimes replicated) server is used in ordinary operation when clients are directly accessible from the server.In case the server is not available, the clients may switch to Peer-to-Peer communication model and obtain server information by cooperation with surrounding accessible nodes.
This architecture allows network users to cooperate on peer-to-peer basis in case of network failure.They are able to exchange important information regarding the failure, for example, information about planned outages or topology information specifying the point of failure.

Conclusion
We have described the main use cases for network troubleshooting assistance services and focused on their requirements-topology discovery.Community wireless networks require a new approach for this complex task because common solutions using SNMP are not applicable here.
The core module for L3 topology discovery has been successfully implemented and tested as a part of the application for visualisation of the OSPF network.This application fulfills all described use cases for network administrators (except announcing planned outages).This application is available at http://code.google.com/p/ospf-visualiser/.
Solution of the cooperated L2 topology discovery based on LLTD protocol was proposed and is currently tested.
Wireless link classification based on statistical analysis of average and minimal latency was successfully evaluated and is a subject for integration into the implemented application.The network model will be enriched with obtained link types.
These new services bring the better user experience in community wireless networks, help network administrators with their common tasks, and allow the rational further network development.Although the initial demand for these services came from the field of community wireless networks, their use is not restricted to this area.