Using Open Big Data to Build and Analyze Urban Bus Network Models within and across Administrations

Jiangsu Institute of Urban Planning and Design, Nanjing 210036, China School of Architecture and Urban Planning, Nanjing University, Nanjing 210093, China Key Laboratory of Watershed Geographic Sciences, Nanjing Institute of Geography and Limnology, CAS, Nanjing, China Wuhan Land Use and Urban Spatial Planning Research Center, Wuhan 430014, China School of Remote Sensing and Information Engineering, Wuhan University, Wuhan 430072, China


Introduction
Prioritizing urban bus networks (UBNs) has become an important way of solving many urban transportation problems, such as alleviating traffic congestion [1] and reducing emissions. In general, an urban bus is the most basic and important mode of transportation for urban and rural residents and is much more environmentally friendly than other modes of transportation [2]. Additionally, improving the service capacity of UBNs has become an important part of the current construction of livable cities promoted by the Chinese government. In this respect, an in-depth topological and statistical analysis of UBNs is of fundamental importance for the evaluation of the capacity of current urban public services [3,4]. is has also become an indispensable part of urban planning.
In recent years, complex network theory [5,6] has become the most common and effective method for the construction and analysis of UBN models, and many metrics have been used to evaluate the characteristics of UBNs. For example, Wang et al. [4] studied the spatial configuration of the UBN in the city of Shenzhen, China, adopting the metrics of degree centrality, betweenness centrality, clustering coefficient, and average path length. Xu et al. [7] found that the degree distributions of all UBNs of 330 cities in China were approximated by exponential distributions. Zhang et al. [8] explored the structural characteristics of dynamic weighted UBNs by using complex network theory.
In general, complex network theory has been used widely in previous studies. e investigation of UBNs has been a long-standing research topic. It has been conducted in various ways, focusing on issues such as spatial configuration [9], design and optimization [10,11], and network structure [12]. However, the data used for the construction and analysis of UBNs were mainly from local government agencies, which might vary from one city to another in terms of the data standard. is prevents comparability across different cities, while these comparisons are very important in urban planning. Moreover, these data are typically not readily available due to business privacy, which poses another constraint for UBN research. With the expansion of urban transportation systems, UBNs have become incredibly large and complicated, and there are high demands for the automatic construction of UBN models through open big data [3,13]. erefore, it is necessary to establish a framework for open big data acquisition and processing in UBN research.
Open big data has become available and attracted intensive attention from the field of urban planning [4,13], thanks to advances in Internet and Communication Technologies. Many studies have reported fruitful results using various types of data, such as high-speed railway data [14,15], metropolitan rail transport data [16], aviation flight data [17][18][19][20], patent data [21], shipping track data [22,23], and subway credit card data [24]. ese data contain nodes with geographic coordinates and information about the strength of the connections between them. However, as for building UBNs based on big open data in China, there are four problems that have not been noticed before but are important for developing the research model. In this respect, this study has two notable features regarding data processing: (1) it has a highly automated capability to collect open big data, which provides a consistent way of analyzing UBNs across different regions; (2) it considers how to merge bus stations according to the requirements of urban planning and the accuracy of data calculation.
Previous studies have mainly focused on the analysis of UBNs in metropolitan areas. For example, two studies [25,26] pointed out that UBNs operated inefficiently in metropolitan areas. However, there is a lack of evaluation and research on UBNs at the county level, and particularly, there are very few studies that examined UBNs across different counties. In fact, due to the rapid development of Chinese cities, the trend of integration development and the travel demand across different regions are becoming more and more important and urgent. us, the evaluation and design of the current UBNs at different levels becomes an important aspect of the sustainable development of Chinese cities in the future. In addition, the current UBNs might have a few problems, which are often discussed in line with the issues of sustainable urban development, such as the lack of support for urban commuting [27] and the inefficient layout of feeder bus routes [28][29][30].
More importantly, this study paid special attention to the analysis of cross-county UBNs, which have rarely been investigated in existing literature. Particularly, this study aimed to answer the question of whether the service quality of cross-county UBNs is very poor. For example, crosscounty UBNs tend to have the problem of "broken links," which refers to bus lines that do not have access to other established UBNs in a small area. Furthermore, the analysis of UBNs in many large cities of China has been mainly used for network optimization instead of network expansion. In order to meet the needs of sustainable urban development, it is challenging to relocate bus stations, but it is relatively convenient to optimize the bus lines [31,32]. us, this study not only focused on the analysis of bus stations but also emphasized the important role of bus lines for the construction and analysis of UBN models [33]. e remainder of this paper is structured as follows. Sections 2 and 3 provide the introduction of materials and methodology. In Section 4, the results of two case studies are provided. Finally, a discussion of the results and conclusions are presented in Section 5.

Study Areas.
e study areas of the two case studies are shown in Figure 1, and the data were all obtained from the Gaode map (the Chinese version of map service provider, like Google and TomTom) in May 2019. e first case is a UBN analysis in a county-level administration, while the second case is a UBN analysis across different counties.
In the first case, the major features of the UBN are shown in an independent county-level administrative area. is case study area is Kunshan, Jiangsu Province, China, which neighbors Shanghai and is one of the top 10 economically strongest counties in China. It is selected because its UBN covers the entire administrative area.
is case is representative of studying the spatial equality of UBNs, so that more people can enjoy better public transport services in small and medium cities.
In the second case, a cross-boundary area with three adjacent county-level administrative units was selected from different provinces in the Yangtze River Delta metropolitan area.
ey are Jiashan County of Zhejiang Province, Wujiang District of Jiangsu Province, and Qingpu District of Shanghai. It is worth mentioning that the UBN in each administrative unit has been developed independently and the connections across the three administrative districts through their UBNs are relatively weak. e Yangtze River Delta is one of the three most developed urban agglomerations in China and its integrated development across different administrations has been strengthened by market forces and promoted by the central government. us, the second case is of great significance to the integrated development of cross-county UBNs in China's urban planning.

Data Problems in the Analysis of Urban Bus Networks.
As shown in Figure 2, there are four major problems for the analysis of UBNs using open big data.
As shown in Figure 2(a), a bus station may be represented by multiple geospatial points of the same name which are adjacent. It is particularly true for a transfer station, where two or more bus lines cross each other. In this case, the same transfer station obtained from different bus lines might have different geographical locations and is not completely coincident. In fact, this transfer station should be unique.
As shown in Figure 2(b), some bus stations might be geographically far away from each other in a large study area, although they have the same name. erefore, these stations have to be treated as different ones.
As shown in Figure 2(c), there are small differences in the names of the same bus station on different bus lines.
ese bus stations are adjacent or coincident, and they should be treated as the same bus station.
As shown in Figure 2(d), some bus stations are actually not the same station, but they are very close to each other.
ere are many reasons for this situation. For example, multiple bus stations are built in a high-speed railway It belongs to Zhejiang province.

It belongs to
Shanghai.

Bus line
Location of study area   Complexity station, or they are built in a small area to meet the needs of bus line stops. For urban planning, these stations need to be merged into one bus station to observe the bus service capability of a geospatial entity, such as bus stations in the high-speed railway station areas.

Two Types of Urban Bus Network
Model. e first type of UBN model is shown in panel (a) of Figure 3, which is typically known as the "Line-Station" representation. In this network, bus stations are network nodes, bus lines between bus stations are network edges, and the number of bus lines can be used as the weight value of the network edge.
e Line-Station-based UBN model is conventionally used to examine the characteristics of bus stations, while ignoring the bus lines, for instance, identification of a bus station with an important transit function in the network [34]. In addition, these studies tend to use the P-space to establish the complex network [3].
e second type of UBN model is shown in panel (b) of Figure 3, which is known as the "Line-Line" representation. In this network, bus lines are modeled as network nodes, and bus lines passing through the same bus station are considered as network edges among them, and the number of bus stations that bus lines pass through is taken as the weight value of the network edge. e Line-Line-based UBN model construction is more concerned about the bus lines, and hence it is more practical for the planning and management of bus lines in public transport [33].
In addition, the "Line-Line" model is often better than the "Line-Station" model in data visualization, because the bus line object is clearer and more significant than the bus station object in graphic representations. In other words, when applying a "Line-Line" model to evaluate UBNs across different counties, it is much easier to detect which bus lines are across different administrations and where the "broken links" exist. In the first case study, because of the high level of urban-rural integrated development in Kunshan, the analysis of the cross-administrative issue is not very important. erefore, the "Line-Station" model is used for the analysis of the UBN in this case, while the "Line-Line" model is used for the analysis of UBNs in the second case.

Method of Analyzing Urban Bus Networks Using Open Big
Data. As shown in Figure 4, the method to analyze a UBN constitutes includes three steps.

Step 1: Collecting Data on Bus Lines and Bus Stations
Automatically. Firstly, the paper selects the online map provider, which can provide abundant point of interests (POIs) data over the study area. is is because POIs data contain information of a bus station, which records the information of all bus lines passing through it. Secondly, it collects the POIs data with the type of bus station via the application programming interfaces (APIs) provided by these map providers.
irdly, the names of bus lines are extracted from these POIs, which are further used to crawl the detailed information of both bus lines and bus stations in a city. In this respect, it can automatically collect all the information of bus lines and bus stations by simply providing the city name of the study area via the use of the official data acquisition interface API.

3.2.2.
Step 2: Processing the Collected Bus Network Data Using Spatial Constraints. In this step, this paper proposes an effective solution to cope with the four problems illustrated in Figure 2. As shown in Figure 5, firstly, it merges the bus stations with the same name within a certain distance (e.g., 250 m) of each other into one bus station. Secondly, it distinguishes bus stations with the same name but with a long distance from each other (e.g., >250 m) by adding a suffix identifier. ereafter, the bus stations should have different names. irdly, it merges the bus stations within a certain distance (e.g., 100 m) into one bus station.

3.2.3.
Step 3: Building Two Types of Urban Bus Network Using the Processed Data. In this step, two types of UBN are built, namely, the "Line-Station" network and the "Line-Line" network. It should be noted that the two types of network are modeled as directed weighted networks and they are established using the new data on bus stations and bus lines, which are generated in Step 2.

Node Degree Centrality.
e node degree centrality (DC) is defined as the number of edges coincidence on the current node. It can reflect the number of direct neighbors of the current node, and it is mathematically defined in (1), where L ij reflects the connection between node i and node j and n represents the total number of nodes [35][36][37]. One has

Node Strength Centrality.
e node strength centrality, which is also called "Weighted Degree Centrality" (WDC), is defined as the summation of weights of edges coincidence on the current node [35]. It can reflect the intensity of interactions between the current node and its neighboring nodes, and it is mathematically defined in (2), where W ij is the weight of the edge between nodes i and j and n represents the total number of nodes. One has

Weighted Betweenness Centrality.
Weighted betweenness centrality (WBC) is defined as the number of shortest paths between two nodes that pass through the current node considering the edge weight. It is used to measure the importance of nodes serving as bridges in the network [38,39], and it is mathematically defined in (3), where s and t consist of a node pair in clusters V; σ(s, t) indicates the number of weighted shortest paths between node s and node t; and s, t i is the number of weighted shortest paths between node s and node t passing through node i. One has

Community Detection.
e community detection method can partition an entire network into tightly connected subnetworks, and the community can be understood as a class of nodes with similar characteristics. ere are many different types of community detection methods in the literature, and the most commonly used one is the modularity-based method. Modularity is a measure of the degree to which a network's communities may be separated and recombined, which is a commonly used criterion for partitioning a network into a certain number of communities [40,41]. e larger the modularity value, the better the division of the community structure. In real-world systems, the value of modularity usually ranges from 0.3 to 0.7 [42].
is paper employs a novel method based on modularity optimization [43,44], which partitions the network into a number of distinct modules if there is clear modularity in the network.

Case Study 1: Data Extraction and Analysis of an Urban Bus Network in a Single Administrative Area.
is case study used the "Line-Station" type of UBN model, which mainly shows the overall characteristics of the spatial distribution of bus stations in a single administrative area. As shown in Step 1: collecting bus lines and bus stations automatically Step 2: processing the collected bus network data using spatial constraints Step 3: building two types of bus networks using the processed data  Complexity Figure 6(a), it displays the spatial distribution of the WDC of bus stations. Specifically, Yushan Town has the strongest bus service capability and has a fusion development trend with an Economic Development Zone, while the bus service capability in the other towns is relatively weak. Figure 6(b) represents the spatial distribution of the WBC of bus stations, which can be used to understand the transit capacity of bus stations in the study area. For instance, bus stations with high WBC values in Yushan Town are likely to be densely distributed, while those with low WBC values in other towns are relatively sparsely distributed. In these respects, the UBN of Kunshan is mainly concentrated and fully developed in the town of Yushan. Figure 7 displays the community structure of the UBN, which can be used to examine the demarcation of transportation space. From this figure, the UBN is organized into 13 communities with a certain degree of spatial coherence. e modularity value is 0.613, which shows that the community pattern is satisfactory. More importantly, the spatial organization characteristics of the UBN in Kunshan can be seen in Figure 7: two communities in the south of Kunshan can be clearly demarcated, which are related to five towns. For instance, the towns of Zhangpu, Jinxi, and Zhouzhuang are in the same community (community 3), while the towns of Qiandeng and Dingshanhu are in another community (community 10). Communities in the north of Kunshan are intertwined and are covered by six towns. For instance, it is much more intricate for the town of Yushan and its four neighboring towns.
Furthermore, this study calculated the proportion of bus stations of each community in different administrative units. Table 1 shows the distribution of the proportion of bus stations of each community in each administrative unit, and two things become apparent. First, Zhouzhuang Town, Jinxi Town, and Dianshanhu Town in the south of Kunshan are all composed of a single community, and thus the spatial structure of the subnetwork in each administrative unit is relatively simple. Second, other towns are composed of multiple communities, especially Yushan Town and the Economic Development Zone, where the spatial structure of the subnetwork is relatively complicated. ese community patterns may be roughly explained by the socioeconomic development of the towns. e socioeconomic level in the northern part of Kunshan is high, and thus the connections via the UBN among these towns are relatively strong. However, in the south of Kunshan, there are many water towns and tourist towns, which have a relatively lower level of socioeconomic development. Hence, the connections via the UBN are likely to be concentrated on a single or a few towns to satisfy the need of specified industries. At the center of the city, Yushan Town and the Economic Development Zone are composed of many subnetworks to meet the different needs of urban bus travel.
Merge the bus stations of the same name within a certain spatial distance Distinguishes the bus stations with the same name but with a long distance Merges the bus stations within a certain distance

Case Study 2: Data Extraction and Analysis of Urban Bus
Networks of Cross Administrations. is case study used the "Line-Line" type of UBN model, which is suitable for assessing the development of UBNs across different counties. is paper analyzes this issue from four aspects.
First, Figure 8 displays the spatial distribution of the WDC of the bus lines. Bus lines with high WDC values are mainly distributed in Jiashan and Qingpu, while those with weak values are mainly located in Wujiang. Additionally, there are very few bus lines with high WDC values across different administrative counties. at is to say, bus lines with the highest value of WDC tend to be constrained in one independent administrative county, and they can be affected by the spatial layout of the administrative county.
Second, Figure 9 displays the spatial distribution of the WBC of bus lines, which can be used to identify the hub-type bus lines in the study area. Bus lines with a strong hub function are mainly distributed in the junction of the three administrative regions, which indicates that a certain number of hub bus lines across counties have been formed in the study area. However, the number of these bus lines is still very small compared with the majority of nonhub bus lines. Besides, many bus lines are not connected in the junction of administrative districts, which is typically known as the "broken link" phenomenon in urban planning [45]. Particularly, there are no bus lines connecting Jiashan and Qingpu, which might hinder integrated regional development.
ird, Figure 10 displays the community structure of the UBNs, which can be used to examine the demarcation of transport space. is figure suggests that the UBNs can be divided into nine communities with a high degree of spatial coherence. e modularity value is 0.758, which shows that the community pattern is reasonable and satisfactory. However, there are no obvious cross-administrative communities, which further indicates that the UBN of each administrative region is relatively independent.  Figure 6: e spatial distribution of WDC and WBC and their spatial trends are illustrated by kernel density estimation.

Complexity
Fourth, to evaluate the service capacity of the UBNs across different counties, this study calculated the proportions of bus lines with respect to different administrative units in each community. As shown in Table 2, there are only two communities covering two or more administrative regions, among which community 1 spans two administrative regions

Conclusion
e contributions and limitations of this study are as follows. Firstly, data acquisition has always been a bottleneck in the study of cross-county UBNs. In other words, how to get reliable data efficiently is a potential problem. To cope with this problem, this paper demonstrated a methodological framework to analyze UBNs using open big data. ese open big data are collected from the same data source, which guarantees the reliability of a cross-comparison of the structures and organizations of UBNs in different counties.
is is of fundamental importance for urban planning. Secondly, UBNs were represented as the "Line-Station" model in many previous studies, which took the bus station as the network node. In this study, it is much more useful to represent the UBN with the "Line-Line" model, where the bus line is taken as the network node. is is because it is much easier and more convenient to adjust bus lines than to retrofit or build new bus stations. In addition, this type of UBN representation is much more effective in the analysis of    Complexity the relationship between different regions in a study area, which can assist a comprehensive and objective judgment on the evaluation of UBNs. irdly, this study explored the application of the methodological framework to two case studies, which might provide explicit implications from the perspective of urban planning. In the first case study, the overall spatial characteristics of bus stations were evaluated in a single county, where the "Line-Station" model was adopted. However, it is very limited on the exploration of the spatial relationship among subnetworks. In the second case study, this study analyzed UBNs across different counties, where the "Line-Line" model was adopted. Specifically, the "Line-Line" model can enhance the understanding of the integration and development of UBNs between different counties.
Furthermore, for urban planning and design, the contributions of this study are mainly focused on the following three aspects: (1) e analytic results can attract more scholars and urban planners to examine the spatial characteristics of UBNs in counties, not just in big cities, using open big data. is can help to improve the spatial equity level of public transport services in many areas of rural China. (2) e study can help urban planners to identify the practical problems of cross-county UBNs through a standardized technical approach. (3) It can also provide data source and methodological support for finegrained urban management, such as helping government officials to evaluate urban bus stations and lines dynamically and to improve the UBN service quality continuously.
Limitations of this study are also highlighted for future studies. Firstly, this study only analyzed UBNs, while it lacked the analysis to consider other transport modes, such as subway networks. Second, it needs to fuse other types of open big data to improve the capacity of capturing urban planning problems [46]. ird, the impact of using different properties weights for the UBN analysis should get more attention for more diverse applications [47]. Fourth, the accessibility and spatial equality [48,49] of urban bus stations are also important for future UBN analysis.
Overall, this paper provides a methodological framework for building and analyzing UBN models using open big data, which is valuable for the planning and management of urban public transportation facilities. is framework was applied in two case studies, where the structure and organization of UBNs were examined and analyzed from an urban planning perspective. e analytic results can be valuable for urban planners and government agencies in many aspects of understanding the sharing of public services across different counties, managing UBNs in an effective way, and recognizing the importance of the county-level bus networks in China.

Data Availability
Data are made available upon request to the corresponding author.

Conflicts of Interest
e authors declare no conflicts of interest.