A Street-Level IP Geolocation Method Based on Delay-Distance Correlation and Multilayered Common Routers

,


Introduction
anks to the rapid growth of mobile multimedia services like online video and remote conferencing on smart mobile devices (e.g., cellphone and tablets), the fifth-generation (5G) mobile and wireless communication systems are in great demand all over the world [1][2][3]. How to manage the trust relations between users and multimedia content providers in the 5G network is an important problem [4]. Previous works like [4] point out that the geographical locations of users or content providers are important information for detecting unauthenticated or malicious devices. IP geolocation can find the geographical location of Internet hosts as well as smart devices based on its IP address [5]. Besides authentication, IP geolocation also helps in identifying the geographical location of cyberattacks or online frauds for law enforcement organizations and government agencies [6].
Existing IP geolocation methods can be categorized into two kinds by accuracy: city-level IP geolocation and streetlevel IP geolocation. City-level IP geolocation aims to find the city where the target IP is located. e median error distance of main city-level IP geolocation methods is between tens and hundreds of kilometers. After obtaining the city-level location information, street-level IP geolocation methods can be used to find the specific street, community, or organization where the target IP is located, of which the median error distance is usually less than 10 kilometers.
ere are three main street-level IP geolocation methods: Checkin-Geo [16], Geo-NN [17], and Wang-Geo [18]. Besides these three methods, IP databases can also provide the more websites are deployed in cloud services, the geographical location of a website is not necessarily related to its company. erefore, the number of classical web landmarks that Wang-Geo method relies on becomes more and more limited. is is also an ineligible influence on the performance of Wang-Geo method in recent years.
To obtain more accurate measurement-based IP geolocation results in weakly connected networks, this paper proposes an IP geolocation algorithm based on relativedelay-distance correlation and multilayered common routers-Corr-SLG. Corr-SLG includes two parts: landmark collection and IP geolocation.
In the stage of landmark collection, besides the classical web landmark collection method, we present a new streetlevel landmark collection method called WiFi (Wireless Fidelity) landmark. is method collects landmarks based on the diversely distributed WiFi access points and the accurate geographical location information of smartphones.
In the stage of IP geolocation, to find out which landmark is nearest to the target IP in a weakly connected network, Corr-SLG divides landmarks into three groups based on the relative-delay-distance correlation. In the group where delay-distance correlation is strongly positive (i.e., near to 1), the landmark which has the smallest delay to target IP is selected as the candidate landmark (i.e., the landmark nearest to target IP). In the group where delaydistance correlation is strongly negative (i.e., near to -1), the landmark which has the largest delay to target IP is selected as the candidate landmark. In the third group, the landmark is randomly selected as the candidate landmark. To introduce more landmarks that are near to target IP into the selection procedure of candidate landmarks, Corr-SLG selects candidate landmarks not only from the closest common router layer but also from the other common router layers. e experiments in one province capital city of China, Zhengzhou, show that Corr-SLG can increase the accuracy of street-level IP geolocation by about 38.59%. e rest of the paper is organized as follows. In Section 2, we show that the two key assumptions of Wang-Geo are not always true in two real-world networks. en, we present the landmark collection method in Section 3. Section 4 introduces IP geolocation algorithm of Corr-SLG. Section 5 shows the experiment results. e paper is concluded is in Section 6.

Two Assumptions of Wang-Geo Method
Wang-Geo method is a typical street-level measurementbased IP geolocation method. In this section, we first introduce the basic principles of Wang-Geo method. en, we will test whether the two assumptions are true in two realworld networks.

Basic Principles of Wang-Geo Method.
Wang-Geo method maps target IP to the closest landmark by three steps. e first two steps actually use a modified version of one classical city-level IP geolocation method-CBG [8]-to find the city or region where the target IP is located. is paper mainly discusses the street-level IP geolocation, which is mainly done in Step 3.
Step 3 is shown as follows: (1) As shown in Figure 1, a probing host P measures the delays and router paths to the landmarks (L1, L2, L3) and the target IP T. In this paper, the probing host means a computer which can be used by researchers to measure the delay between it and other computers, and its geographical location is also known by the researchers. (2) Find the closest common router between T and each landmark. e closest common router between T and L1 is R1; the closet common router between T and L2 is also R1; the closest common router between T and L3 is R3.
(3) Calculate the relative delay [18] between T and each landmark. Wang-Geo method only calculates relative delay based on the closest common router. e relative delay between T and L1 is d 1 + d 2 ; the relative delay between T and L2 is d 2 + d 3 ; the relative delay between T and L3 is d 2 + d 4 + d 5 + d 7 + d 8 . (4) e landmark which has the smallest relative delay to the target IP is chosen as the candidate landmark. In this paper, the landmark estimated to be nearer to the target IP than the others in a group of landmarks is called a candidate landmark. (5) If there is only one probing host, the target IP is mapped to the location of the only one candidate landmark; if there is more than one probing host, the target IP is mapped to the location of the candidate landmark which has the smallest relative delay.
e relative delay will be underestimated if the delay to one closest common router is overestimated. is will affect the candidate landmark selection.
is kind of router is called "inflating router" [18]. Wang-Geo method divides the geographical distance by the relative delay between two landmarks to discover inflating routers. All the measured data associated with inflating routers have to be discarded.
We collect 80 web landmarks, respectively, in Zhengzhou, one province capital city of China, and Toronto, the capital of Canada. We calculate the relative delays between web landmarks in each city and simply check the closest common routers whose corresponding relative delays are negative.
e results show at least 50% of the closest common routers in both two cities are inflating routers. is may be caused by the data processing policy of routers inside cities. Many routers tend to give a low priority to the packets which aim to measure the delay to these routers. is phenomenon may forbid some landmarks which are near to target IP to be selected as candidate landmarks if their routers are inflating. is means that Wang-Geo method has to give up a considerable amount of landmarks because the common routers between them and target IP are inflating, which may affect the accuracy of Wang-Geo method.

Motivation.
Wang-Geo method is actually based on the two important assumptions: (1) for one host, the smallest delay comes from the host which has the smallest geographical distance to it, so Wang-Geo method selects the landmark which has the smallest relative delay as a candidate landmark; (2) the distance between hosts which share the closest common routers is usually smaller than that between hosts which share the other common routers, so Wang-Geo method only selects a candidate landmark from landmarks which share the closest common routers with target IP and ignores all the other landmarks. In a richly connected network environment, such as PlanetLab dataset used by [18], where these two assumptions are true, Wang-Geo method can achieve accurate geolocation results. However, as we explain in this section, the two assumptions may not hold for a weakly connected network.

Does the Smallest Delay Always Come from the Closest Landmark?
We can judge whether this assumption is true by the delay-distance correlation. Delay-distance correlation is the first-order linear correlation coefficient between delay and distance [23]. To calculate the delaydistance correlation of a certain network, we need to measure network delays and direct geographical distances between a certain group of hosts in this network. e delay of each pair of hosts is measured many times and only the minimum one is selected. e distance of each pair of hosts is calculated based on [24]. Assume that the variance of delay is V delay , the variance of distance is V distance , and the covariance of delay and distance is cov(delay, distance); then, the delay-distance correlation Corr can be calculated by the following formula [12]: Figure 1: Common router and relative delay.
(1) e range of Corr is [−1, 1]. If Corr of a group of hosts is positive and near to 1, the delay is strongly positively correlated with the distance, which means that delay increases as distance increases and decreases as distance decreases. In this circumstance, the smallest delay has a large opportunity to come from the closest landmark. However, if Corr is strongly negative and near to −1, the largest delay usually comes from the closest landmark; if Corr is close to 0, no matter whether it is negative or positive, there is no clear relationship between delay and distance [22].
In [18], researchers find that the smallest relative delay often comes from the nearest landmarks in the USA. us, Wang-Geo method selects candidate landmarks based on the smallest relative delay. We measure the relative delay and distance between 80 web landmarks, respectively, in Zhengzhou and Toronto. e relative-delay-distance relationships of landmarks in the two cities are shown in Figures 2 and 3. e above Corr is called collective Corr in this paper. Collective Corr is calculated based on the delays and distances between a group of hosts and usually represents the general network characteristics of a certain area. If the collective Corr is strong, the relative delays increase as the distances increase, and the smallest delays often come from the nearest landmark. is network is called a richly connected network in this paper. Otherwise, the network is referred to as moderately or weakly connected networks. From Figures 2 and 3, we can see that there is no clear relationship between the smallest relative delay and the smallest distance in both Zhengzhou and Toronto. e absolute values of Corr of landmarks in the two cities are both between −0.1 and 0.1 (i.e., weak). erefore, actually there is no clear relationship between relative delay and distance in weakly connected networks. e collective Corr is weak in many cities (e.g., Zhengzhou and Toronto). is may be because, between two hosts inside a city, the delay caused by geographical distance may only make up a small proportion of the whole delay. e main part of the whole delay consists of queuing delay and processing delay in routers, which has little relationship with distance.
ough the collective Corr is weak in many cities, we find that the individual Corr of a small number of landmarks could be much stronger. Individual Corr is calculated based on the delays and distances between only one specific host and the other hosts. It only represents the network characteristics of a certain host. Figures 4 and 5 show the individual Corr of web landmarks in Zhengzhou and Toronto. It can be seen that although the absolute value of most individual Corr is under 0.2, there is still a small number of landmarks whose individual Corr is much stronger. For these landmarks, the smallest or largest delay is still probably related to the smallest distance. is encourages us to divide landmarks into different groups by individual Corr and apply different candidate landmark selection strategies in different groups.

Is the Distance between Hosts at Share the Closest Common Routers Always Smaller than at between Hosts
at Share the Other Common Routers? In this paper, multilayered common routers mean that all the common routers are shared by two hosts. Besides the closest common routers used in Wang-Geo method, multilayered common routers also include the other common routers. As shown in Figure 1, R1, R2, and R3 are the multilayered common routers of T and L1; R2 and R3 are the other common routers of T and L1.
Wang-Geo method only selects candidate landmarks from the landmarks which share the closest common routers. It actually assumes that the distance between hosts that share the closest common routers is always smaller than that between hosts which share the other common routers. is assumption  is usually true in cities which have an abundant amount of public IP addresses. In this kind of cities, the landmarks which share the closest common routers are very likely to come from one organization because it has a great amount of public IP addresses. ese hosts are closer to each other than hosts which belong to different organizations.
However, organizations in many cities only possess a very small number of public IP addresses. e hosts inside these organizations usually use private IP addresses. Existing landmark collection can hardly get all the public IP addresses owned by one organization. Accordingly, the hosts which share the closest common routers and the other common routers are all from different organizations, which means that there is no clear difference between these two kinds of hosts. Figure 6 shows the distribution of landmarks which share the closest common routers and the other common routers with one target IP. e other common routers in Figure 6 are the second closest common routers. In Figure 6, the landmarks which share the closest common routers with the target IP also cover most areas of Zhengzhou. In Figure 6, the distance between landmarks that share the closest common routers is even larger because their number is smaller.
is encourages us to introduce the landmarks which share multilayered common routers into the selection of candidate landmarks.
Based on the above analysis, we can conclude that, in many cities, the collective Corr is weak. erefore, always selecting the landmark which has the smallest delay as the candidate landmark may cause a large error distance. Moreover, the distance between landmarks that share the closest common routers with target IP may not always be the smallest. In fact, besides the two assumptions, there is still an important influencing factor-enough street-level landmarks-which we will discuss in the next section. ese three problems are the main motivation of the proposed Corr-SLG.

Landmark Collection
A street-level landmark consists of two main components: a public IP address and a street-level geographical location. e number and distribution of street-level landmarks directly influence the accuracy of IP geolocation algorithms. In this section, we will introduce how to collect street-level landmarks in Corr-SLG.

Web Landmark Collection.
Currently, the main method to collect street-level landmarks is the web landmark collection method proposed in Wang-Geo method [18]. Its basic idea is to discover the organizations which own a website and distribute it in the city or area where the target IP lies. e IP of a web landmark is the IP of a website server, and the location is the geographical address of the organization.
is method is one of the key contributions of Wang-Geo method and one of the most important reasons for its high precision. e detailed process of the web landmark collection method can be found in paper [18].
However, web landmark has two flaws: (1) both the number and distribution of web landmarks are limited; (2) it becomes more and more difficult to find enough web landmarks in recent years. It is easy to explain the first flaw. e organizations which own a website are usually only abundant in several metropolises. In addition, most of the organizations which own a website usually distribute it in several certain areas of the city, like Central Business District. As for the second flaw, not all organizations put their website servers in their own buildings. In fact, as the cloud services develop, more and more organizations deploy their websites in cloud services, which will inevitably reduce the number of available web landmarks. To provide more streetlevel landmarks, we try to present a new street-level landmark collection method in this paper.

WiFi Landmark Collection.
In recent years, both the number and distribution of WiFi access points increase very fast. Many public places like banks, supermarkets,

Security and Communication Networks
and hotels provide WiFi access points for people to go online. Inspired by this phenomenon, this paper presents a new way to discover street-level landmarks, the WiFi landmark collection method, which is shown in Figure 7.
In one public place, we connect to the WiFi which the public place provides to make the smartphone get online. en, the WiFi landmark collection software installed on the smartphone will measure the router path to one Internet server. e IP address of a WiFi landmark is the first public IP address of the measured router path from the smartphone to the server. e location information of the WiFi landmark is the Global Position System (GPS) location of the smartphone. Besides IP and location, we also need to record the router path to the server, the Service Set Identifier (SSID) of the WiFi access point, the geographical address, and the name of the public place.
After landmark discovery, a number of landmarks have to be removed because the GPS location may be far away from the real geographical location of the IP. If the first IP address of the router path to the server is a public IP address, the distance between the real geographical location of the IP and the GPS location of the smartphone is limited by the scope of the public place, which can usually be ignored. However, if the first public IP address appears on the other hops of the path, the distance may be much further. We have to check all landmarks in the following steps: (1) If the first public IP address appears on the first or the second hop of the router path, the landmark can be preserved; the others are discarded. (2) en, check all the preserved landmarks. Find out the landmarks which have the same IP. e WiFi access points of these landmarks use the same router to go online. If all these landmarks belong to one public place, these landmarks can be replaced by one single landmark. e IP of this landmark is the IP of the replaced landmarks, and the location is the location of the public place to which the replaced landmarks belong. If these landmarks belong to different public places, but the maximum distance between them is under 50 m, then these landmarks can be replaced by one single landmark, too. e IP of this landmark is the IP of the replaced landmarks while the location is the average of GPS locations of the replaced landmarks. If the distance is larger than 50 m, all of the landmarks which share the same IP are discarded.
WiFi landmarks are very reliable because the distance between the real geographical location of a public IP address and the GPS location of the smartphone is limited. Furthermore, WiFi access points are widely distributed in many cities. In fact, many families and companies also use WiFi access points. e main shortcoming of WiFi landmarks is its high collection cost. Currently, it should be used as a supplementary measure if the web landmarks are insufficient in certain areas. In this paper, we mainly use WiFi landmarks as the target IP which needs to be located in experiments.

IP Geolocation Algorithm of Corr-SLG
In this section, we will illustrate the IP geolocation algorithm of Corr-SLG. is paper mainly focuses on street-level IP geolocation, so here we assume that before IP geolocation, we already know the city where the target IP is located and get enough street-level landmarks of this city. e geolocation result of Corr-SLG is different on each probing host. e final result is the average of the geolocation result of each probing host. Accordingly, the following introduction of IP geolocation algorithm is on one single probing host if there is no special instruction.

Extracting Multilayered Common Routers.
First, the probing host measures the router paths to all landmarks, and then it measures the delays to all the landmarks and the middle routers. e delay between each pair of hosts is measured many times, and only the minimum one is selected.
Second, multilayered common routers between each pair of landmarks are extracted (multilayered common routers are all common routers between two hosts). At last, we can get all the common routers of the landmark dataset. For each common router, we find the corresponding landmarks dataset, which are the landmarks whose router paths include the common router.
ough there are some landmarks included by different corresponding landmarks dataset, the landmark datasets of different common routers are usually different. In this paper, we refer to a common router and its landmark dataset as "a layer." e relationship between one common router and its landmark dataset is shown in Figure 8.
Each pair of landmarks usually share at least one common router, the probing host. However, in some special circumstances (e.g., when the preceding hops of router path are all anonymous routers), a landmark may not share any common routers with the other landmarks. Because both Wang-Geo method and Corr-SLG need to geolocate target IP by common routers, in this circumstance, we can add a temporary virtual common router before all the paths of landmarks.
e delay between the probing host and the virtual common router is zero. However, the fundamental way to solve this problem is to get enough landmarks or change the probing hosts.

Calculating Individual Corr.
In this part, we need to calculate the individual Corr of each landmark. Before this, first, we need to measure the relative delay between landmarks in each layer. For two landmarks L i and L j in one layer, assume that the delay between the probing host and L i is d i , the delay between the probing host and L j is d j , the delay between the probing host and the common router of this layer is d r , and the relative delay between L i and L j in this layer is d i + d j − 2d r . us, the relative delay may be different even for the same pair of landmarks if they are in different layers, and so is the individual Corr of each landmark. e individual Corr of one landmark is calculated based on its relative delay and distance to the other landmarks in the same layer (the formula of Corr is included in Section 2). For one landmark, its individual Corr in different layers is usually different.
If the number of landmarks in one layer is less than 5, there is no need to calculate the individual Corr, because the absolute value of the individual Corr tends to be too large.

Searching for the Best Combination of Parameters.
Before the IP geolocation algorithm of Corr-SLG can geolocate an unknown target IP, we need to set three key parameters to make the geolocation result as accurate as possible. is part is responsible for searching for the best combination of parameters. In this part, we geolocate the landmark dataset and try to find out the combination of parameters whose corresponding median error distance is the least. is part is the critical process of Corr-SLG and consists of three steps.

Selecting Candidate Landmarks in Each
Layer. In this step, a landmark is treated as a target IP and the other landmarks are used to geolocate it. Candidate landmarks are selected from the landmarks which share the common routers with target IP.
For the target IP, first, we need to check its router path and extract all the common routers its path includes. en, each layer will choose its own candidate landmarks, respectively. As shown in Figure 9, R is one common router that belongs to the target IP T. Li (L1, L2, L3, . . .) are landmarks of this layer. Corr-SLG divides the landmarks of this layer into three groups based on two key parameters, C a and C b . If individual Corr is larger than C a , the landmark belongs to Group A; if individual Corr is less than C b , the landmark belongs to Group B; the other landmarks belong to Landmark Common router e candidate landmark selection strategy is different in each group: the candidate landmark of Group A is the landmark which has the smallest relative delay to target IP; the candidate landmark of Group B is the landmark which has the largest relative delay to target IP; and the candidate landmark of Group C is selected randomly. If the number of landmarks in one layer is less than 5, all of them are chosen as candidate landmarks. In this way, each layer can select at most 3 candidate landmarks or at least 1 candidate landmark.
Wang-Geo method has to give up all the data associated with "inflating routers." Because Corr-SLG selects candidate landmarks in each layer and the influence of the inflating router is the same for all the landmarks in the same layer, there is no need to discard any router or landmarks even when the relative delay is negative. is can help increase accuracy.

Discarding Outliers.
In this step, we need to discard landmarks that may be wrongly selected as candidate landmarks. For example, landmarks which do not have an individual Corr or are randomly selected from Group C may be very far from target IP. e wrongly selected candidate landmarks can be discarded by detecting outliers. We gather the candidate landmarks from each layer. If the number of all candidate landmarks is more than 2, we can use LOF (local outlier factor) algorithm [25] to detect the outliers. Candidate landmarks are ordered by LOF value in ascending order. e LOF value of outliers is bigger than the others. Another key parameter R is used here to control the number of outliers that will be discarded. Only the first R% of all candidate landmarks will be kept. e range of R is [1,100].
If there are no more than 2 candidate landmarks, there is no need to detect outliers. e geolocation result is the average of the location of all remaining candidate landmarks.

Finding Minimum Median Error Distance.
Only after the three key parameters are set, Corr-SLG can get a geolocation result. is paper searches for the best combination of parameters by finding the minimum median error distance of the landmark dataset. At the first time to geolocate the landmark dataset, C a is 0, C b is −1, and R is 1. After getting the geolocation result of all landmarks, the median error distance is calculated. en, both C a and C b are increased by 0.1 while R is increased by 1 at one time. All the landmarks are geolocated again using new combination parameters. At last, all combinations of three parameters and their corresponding median error distance are gathered. e best combination of parameters is the one that has the minimum median error distance.

Geolocating the Unknown Target IP.
First, the probing host measures the router path to target IP and then measures the delay to target IP and middle routers. e delay between each pair of hosts is measured many times, and only the minimum one is selected. en, probing host can geolocate target IP based on the best parameters in the same way as geolocating the landmark. e only difference is that the parameters to geolocate the target IP are already determined. If there is only one probing host, its geolocation result is the estimated location of the target IP; if there is more than one probing hosts, the average of the geolocation results of all probing hosts is the estimated location of the target IP.

Evaluation
To test whether Corr-SLG can increase the accuracy of street-level IP geolocation, this paper makes experiments in one province capital city of China, Zhengzhou.

Experiment Dataset.
e previous experiments in Wang-Geo method [18] and Checkin-Geo [16] use web landmarks as landmark dataset. To keep consistent with previous work, this paper also uses web landmarks as landmark dataset. WiFi landmarks with known locations are used as the target IP dataset.
is paper discovers 3104 websites based on organization names in Zhengzhou. After the websites which may not put their servers inside the organizations are discarded, 181 web landmarks are retained. 163 WiFi landmarks are collected in the way illustrated in Section 3. In this experiment, only one probing host located in our lab is used to reduce the deployment cost.

Searching for Best Parameters.
e probing host searches for the best parameters based on the landmark dataset. When C a � 0.45, C b � 0.35, and R � 0.35, the median error distance of Corr-SLG for the landmark dataset is the smallest, 3.34 km. e median error distance of Wang-Geo  method for the landmark dataset is 8.95 km. e best parameters of Corr-SLG can decrease 63.27% of the median error distance. Figure 10 shows the cumulative probability of error distances of Corr-SLG and Wang-Geo method on the landmark dataset.

Geolocating Target IP Dataset.
en, the probing host geolocates the target IP dataset based on the best parameters. e median error distance of Corr-SLG for target IP dataset is 4.82 km. e median error distance of Wang-Geo method is 7.85 km. Corr-SLG can decrease 38.59% of the median error distance. Figure 11 shows the cumulative probability of error distances of Corr-SLG and Wang-Geo method for target IP dataset. Based on the experiment results, Corr-SLG can increase the accuracy of street-level IP geolocation by about 38.59% in Zhengzhou.
From Figures 10 and 11, we can see that, at first, the performance of Corr-SLG is much better than Wang-Geo and, in the end, Corr-SLG is slightly better. is means the following: (1) for targets with smaller error distances, Corr-SLG is more accurate than Wang-Geo; (2) for targets with larger error distances, Corr-SLG is almost similar to Wang-Geo. is phenomenon is caused by the following reasons. ere are 3 kinds of hosts: (1) the smallest delay comes from the shortest distance; (2) the smallest delay comes from the longest distance; and (3) the smallest delay comes from random distance. Corr-SLG is much better than Wang-Geo on the second kind of hosts and equal to Wang-Geo on the first kind of hosts. For the last kind of hosts, both Wang-Geo and SLG are not accurate.
Please note that, in cumulative probability, the smaller error distances are shown (added) first. For Wang-Geo, the hosts with smaller error distance are only of the first kind. For Corr-SLG, the hosts with smaller error distance include both the first and second kinds of hosts. erefore, at first, there are much more hosts with smaller error distances for SLG than those for Wang-Geo. However, for both Corr-SLG and Wang-Geo, the error distances for the third kind of hosts are large. Moreover, they are all much larger than the previous 2 kinds of hosts. Hence, in the end (for hosts with larger error distance), the performance of Corr-SLG seems similar to Wang-Geo. e main reason for the phenomenon is that Corr-SLG and Wang-Geo are both based on the delay-distance relationship. However, the delay-distance relationship is not clear for the third groups of hosts. How to improve the accuracy for the third kind of hosts remains as our future work.
Besides, we can also see that the increased degree of the target IP dataset is clearly less than that of the landmark dataset. at is because the best parameters for the landmark dataset may not be exactly suitable for the target IP dataset. If there are several kinds of hosts in one city and the difference between their network characteristics is relatively significant, such as hosts belonging to different ISP (Internet Service Provider), it is suggested that using the landmark dataset whose network characteristic is similar to the target IP may achieve better accuracy.

Conclusion
To improve the performance of street-level IP geolocation in weakly connected networks, this paper proposes a measurement-based street-level IP geolocation algorithm called Corr-SLG. First, this method introduces the landmarks associated with multilayered common routers into candidate landmark selection.
is aims to make sure that most landmarks near to target IP have a chance to participate in candidate landmark selection. Second, Corr-SLG selects candidate landmarks in each layer of the common router to avoid the influence of inflating routers.
ird, it divides landmarks into three groups based on individual Corr and uses different candidate landmark selection strategies in different groups. is can achieve better accuracy in weakly connected networks where the smallest delay may not come from the closest landmark. Last but not least, we present a new street-level landmark collection method called WiFi landmark which is inspired by rapid-growth WiFi service. WiFi landmarks can be of great help in cities where the number of web landmarks is insufficient. In this work, we find that there are three different kinds of relative-delay-distance correlation through measurements in two real inside-city networks. is finding helps to improve the accuracy of Corr-SLG. However, we still lack theoretic explanations as to what the reason for different relative-delay-distance correlation is. In our future work, we will carry out network measurements in more different cities in various countries and try to explain the cause of different relative-delay-distance correlation. is can help us to extend Corr-SLG to more cities and have better understanding of inside-city network.
Data Availability e datasets of this work are available from the corresponding author on reasonable request.

Conflicts of Interest
e authors declare no conflicts of interest in publishing this article.