Complex Network Analysis of Pakistan Railways

We study the structural properties of Pakistan railway network (PRN), where railway stations are considered as nodes while edges are represented by trains directly linking two stations. The network displays small world properties and is assortative in nature. Based on betweenness and closeness centralities of the nodes, the most important cities are identified with respect to connectivity as this could help in identifying the potential congestion points in the network.


Introduction
In recent years there has been rapidly growing interest in investigating the statistical and dynamical properties of network systems, containing set of items called nodes or vertices and edges representing interactions between them.Examples include the Internet, the World Wide Web, social networks of acquaintance or other connections between individuals, organizational networks and networks of business relations between companies, neural networks, metabolic networks, food webs, distribution networks such as blood vessels or postal delivery routes, and networks of citations between papers.
Transportation networks are among the most important building blocks in the economic development of a country.The structure and performance of transportation networks reflect the ease of travelling and transferring goods among different parts of a country, thus affecting trade and other aspects of the economy.In the recent years, complex network analysis has been used to study several transportation networks.These include airport networks, for instance, the airport network of China [1,2], airport network of India [3], US airport network [4], and the worldwide airport network [5,6], urban road networks [7][8][9], and railway networks [10][11][12][13][14].
Railways are one of the most important modes of transportation around the world, with the topological properties of these railway networks attracting huge attention.Sen et al. [12] were amongst the first to apply complex network theory to the railway network, while in the process of studying the statistical properties of the Indian railways the authors introduced a new topological representation, the P-Space topology, wherein stations or stops are identified as nodes and are connected if at least one train stops at both the stations.The authors introduced a new method to calculate the shortest distance between two stations.Based on these calculations, the small world properties and exponential degree distribution of the Indian railway network are identified.An extension to this was provided by Majima et al. [15] as the same topology was applied to the Japanese railway network and the same statistical results were obtained.While two different networks exhibited the same properties when illustrated using the P-Space representation, the Chinese railway network also displayed the small world properties of the shortest distance between stations and high clustering coefficient, however, with a power-law degree distribution [13].In another attempt to explain the dynamic nature of the Chinese network, Guo and Cai [16] concluded that the network is a scale-free network when extracted in the L-Space topology.Similarly, Wang et al. [17,18] represented the railway network of China in both L-Space and P-Space and successfully fitted a power-law distribution in both cases.
The PRN is a moderate railway network with over 620 stations and 7,791-kilometer track.Railways are the primary mode of intercity transportation in Pakistan and the network is responsible for transporting massive number of passengers and freight.Even though railways play an important role in shaping the transportation sector of Pakistan, no research has been put forward into studying the complex nature of this network.To the best of our knowledge, this is the first study ever on the complex network theory application on PRN.

Network Construction
Before starting off with the analysis of PRN, it might be a good idea to define the proper network topology.Two methodologies exist in current literature for representing a network, Space L [8,17] and Space P [8,12,18,19] (Figure 1).Space L consists of nodes representing cities, bus, metro, train stops, and sea ports and a link between two nodes exists if they are consecutive stops on the route.Nodes in the Space P are the same as in the previous topology; here an edge between two nodes means that there is a direct bus, train, or metro route that links them.In other words, if a route  consists of nodes   , that is,  = { 1 ,  2 , . . .,   }, then in the Space P the nearest neighbors of the node  1 are  2 ,  3 , . . .,   .The node degree  in this topology is the total number of nodes reachable using a single route and the distance can be interpreted as the number of transfers (plus one) one has to take to get from one stop to another, whereas the node degree  in the previous topology is just the number of directions one can take from a given node, while the distance equals the total number of stops on the path from one node to another [8,12].In this study, we use the Space P methodology to represent the PRN, as this has already been used to represent railway networks [2,12,14].The network was constructed from the official "Pakistan railways time table, " kindly provided by Pakistan railways.The time table had complete details of railway stations, number of trains, and the arrival and departure of each train at/from each station.

Topological Properties
Table 1 provides all computed network statistics, from basic network properties such as the number of nodes and edges to the more complex metrics such as clustering and assortativity.

Degree Distribution.
The degree of a node, a measure of its connectivity, is defined as the fraction of nodes with degree  in a network.Degree is one of the measures of centrality of a node in a network and it symbolizes the importance of a node in a network.Commonly accepted rule is that The degree distribution () is an important feature that reflects the topology of the network and is defined as the fraction of nodes having degree  in the network.However, the cumulative degree distribution is usually preferred as degree distribution is often noisy and there are rarely enough nodes having high degrees to get good statistics in the tail of the distribution whereas the cumulative distribution effectively reduces the number of statistical errors due to the finite network size [14].The cumulative degree distribution of the network is provided in Figure 2. As evident from Figure 3, the railway network of Pakistan is a moderately connected network, with majority of nodes having degrees of 29 or below, whereas a few stations share high degree connectivity and act as hubs.Karachi, Lahore, Hyderabad, Kotri, Rawalpindi, and Peshawar are the most connected stations; however, they also pose a threat to the operations of the railway network, as a failure of one of these major stations can cause a major portion of the network to crash down and halt.This has been the case in the past several times when failure at one major station caused a major halt of railway operations in Pakistan.[20] proposed a model of small world network in the context of various social and biological networks.A small world network is categorized as a network in which most nodes are not neighbors of one another, but most nodes can be reached from every other by a small number of stations.Stated simply, a small world is a network having a small average shortest path length and a large clustering coefficient as compared to a random network with the same number of .We apply the same method to see if the small world properties are present in PRN.

Small World Properties. Watts and Strogatz
The average shortest path length (the minimum number of edges passed through to get from one node to another) from one node to all other nodes of the network is calculated using the following equation: where  correspond to the set of nodes in the network, (, ) is the shortest path from  to , and  is the total number of nodes in the network.A small average path length of two stops or stations ( = 3.2) means that there is connectivity among almost all the stations of PRN, regardless of geographical distance.The network also features small diameter (maximum path length of a network),  = 5.
Clustering coefficient (  ) of a node  is defined as the ratio of the number of links shared by its neighboring nodes to the maximum number of possible links among them.The average clustering coefficient is defined as Using the above equation, the average clustering coefficient () of the network is calculated to be 0.97, indicating that the PRN is a highly clustered network.This result is substantially higher than the value of an equivalent Erdos-Rényi random graph [21], ( ER ) = 0.02.The clustering coefficient together with the small average path length (see above) indicates that the PRN is indeed a small world network.

Degree-Degree Correlation.
Another important topological characteristic of a network that is examined is the degree-degree correlation between connected nodes.A given network is said to be assortative if the high degree nodes have a tendency to connect to other high degree nodes.Similarly disassortative networks are where low degree nodes tend to connect to high degree nodes.Newman introduced a summary statistic for assortativity () in 2002 [22], defined as the Pearson correlation coefficient of the degrees at either end of an edge.Mathematically, this expression can be represented by the following equation: where This statistic lies in between the range of [−1, 1], where −1 indicates a completely disassortative network and 1 indicates a completely assortative network.For the PRN, the assortativity is measured to be 0.34 illustrating high degree nodes at one end of a link showing preference towards high degree nodes at the other end.To justify the result, the average degree of the nearest neighbor,   (), for nodes of degree , can be plotted using the following equation: If   () increases with , the network is assortative.If   () decreases with , the network is disassortative.Figure 3 represents the average degree of the nearest neighbor and it can be seen that the   () increases with degree , consistent with a positive assortativity of 0.34.

Identifying the Major Stations in the PRN.
To identify the stations with high traffic and congestion, betweenness and closeness centralities are used.Betweenness centrality of a node  can be defined as sum of the fractions of all-pairs shortest paths that passes through .Mathematically, where  is the set of nodes, (, ) is the total number of shortest paths, and (,  | ) is the number of shortest paths passing through  [23].The top ten railway stations according to high betweenness centrality are given in Table 2.The station of Jacobabad leads the list as it acts as a link between three different provinces of Pakistan: Sindh, Punjab, and Another studied parameter used to identify the major stations in PRN is the closeness centrality, defined as the average shortest distance from node  to all the other nodes, which reflects the closeness degree of the node with other nodes in the network.The mathematical expression is where (    ) is the shortest distance between   and   and is equal to the minimum stations from   to   in the network whereas ( − 1) is the normalization factor.Closeness centrality reflects the closeness degree from one station to all the other stations in the railway network, the larger the value is, the greater the influence is, and the wider range of service the station has.The top ten stations based on closeness centrality are listed in Table 3.

Conclusion
In this paper we have studied the PRN as an unweighted graph of railway stations.The network clearly displays small world properties and is assortative in nature.The betweenness and closeness centralities of the stations are also computed, wherein these stations are identified as potential congestion points.As public transportation, especially railways, provides crucial mode of movement of passengers, the identification of possible congestion stations may serve an important role in identifying the limitations of the network.Although this study contributes a complex network analysis of the physical state of the PRN, given the availability of passenger/cargo flow data, it would also be interesting to study the weighted network as it could reveal a clearer picture of network dynamics in terms of passenger/cargo flow.Such a study would not only reveal the topological aspects but also provide a detailed insight into the network dynamics by identifying the stations with greater flow, the correlations of the edge weights with the degree of the vertices, and especially the eigenvector centrality where the quality of an edge also matters.

Figure 1 :
Figure 1: Explanation of Space L & Space P.

Table 1 :
Computed properties of Pakistan railways network.larger the degree of a node is, the more important it becomes.The PRN is comprised of  = 628 nodes and  = 6, 078 edges representing the direct link among stations.The average degree of the network is thus 2/ = 19.36 which indicates the average number of stations reachable from an arbitrary station via a single train. the

Table 2 :
Betweenness centrality of top ten stations.

Table 3 :
Closeness centrality of top ten stations.