The Settlement Structure Is Reflected in Personal Investments : Distance-Dependent Network Modularity-Based Measurement of Regional Attractiveness

How are ownership relationships distributed in the geographical space? Is physical proximity a significant factor in investment decisions? What is the impact of the capital city? How can the structure of investment patterns characterize the attractiveness and development of economic regions? To explore these issues, we analyze the network of company ownership in Hungary and determine how are connections are distributed in geographical space. Based on the calculation of the internal and external linking probabilities, we propose several measures to evaluate the attractiveness of towns and geographic regions. Community detection based on several null models indicates that modules of the network coincide with administrative regions, in which Budapest is the absolute centre, and where county centres function as hubs. Gravity model-based modularity analysis highlights that, besides the strong attraction of Budapest, geographical distance has a significant influence over the frequency of connections and the target nodes play themost significant role in link formation, which confirms that the analysis of the directed company-ownership network gives a good indication of regional attractiveness.


Introduction
Mining valuable information from social networks is a hard problem due to its dynamic nature [1,2], complex structure [3,4], and multidimensionality [5].This paper deals with the structural issues as it tries to evaluate regional attractiveness based on a set of goal-oriented null models identified to describe the geographical distributions of companyownership relations.
Complex multivariate socioeconomic data is widely used to monitor regional policy [6,7].As the usage of a different set of variables results in various rankings, the definition and selection of socioeconomic variables are the key issue in these applications.The drawback of these indicator-based approaches is that although economic behavior is socially constructed and embedded in networks of interpersonal relations [8] and strong related to location [9], the network structure of the economy is neglected.
This paper adds a viewpoint to regional studies based on the analysis of how the network of personal investments and the founding of companies relate to the settlement hierarchy.We assume that the socially embedded economy must have a network-based imprint in the company-ownership network which is a good indication of regional attractiveness.
Attractiveness is meaningful in preferential attachment networks, where the likelihood of a new connection is proportional to degree [10] and fitness [11] of the node.These models were generalized to handle initial attractiveness [12] and latecomer nodes with a higher degree of fitness [11,13].It is important to note that these models generate powerlaw (degree) distributions that are similar to the distribution of socioeconomic variables of settlements indicating that preferential attachment is a process that can be used to 2 Complexity describe city grow [14][15][16][17][18].In the case of geographically distributed networks, the likelihood of link formation is dependent on distance due to the cost of establishing connections and spatial constraints [19].Connection costs also favor the formation of cliques and thus increase the clustering coefficient [20].Space is important in social networks as most individuals connect with their spatial neighbors [20] to minimize their effort and maintain social ties [21]; e.g., the majority of our friends are in our spatial neighborhood [22].The probability () that distance  separates two connected individuals is found to behave as () ∼  −2 in terms of Belgian mobile phone data [23], or generally () ∼  − , as has been shown in the case of the social network of more than one million bloggers in the USA [24], in friendship network of Facebook users, and in email communication networks [25,26].
The attractiveness of airports [27], countries for foreign investments [28], and touristic destinations [29] is evaluated based on socioeconomic variables.As many origins and destinations are present in these applications, the theory of bilateral trade flows accounts for the relative attractiveness of origin-destination pairs.The gravity model is one of the most successful empirical models in economics developed to describe such interactions across the space [30].Almost 40 years ago, before the emergence of network science, Anderson suggested that as a force between two mass points, the number of trips from location  to location , follows the (economic version) of the "Gravity" law, () ∼ () ∼    1   2  − [31].Nowadays, many complex networks embedded in space and spatial constraints may have an effect on their connectivity patterns such as trade markets [32], migration [33], traffic flow [34], and mobile communication [23] that can be successfully modeled by a gravity model, which was also successfully applied in link prediction [35].
We assume that regions that heavily rely on local resources consist of more internal connections that form modules in networks, so the modularity of the networks which reflect socioeconomic relationships can be used to measure regional attractiveness.The goal of modularity analysis is to separate the network into groups of vertices that have fewer connections between them than inside the communities [36].In social network analysis, community detection is a basic step in understanding the structure, function, and semantics of networks [4].Community analysis is performed in two separate phases: first, detection of meaningful community structure from a network, and second, evaluation of the appropriateness of the detected community structure [37].Systematic deviations from a random configuration allow us to define a quantity called modularity, that is a measure of the quality of partitions.Newman-Girvan modularity considers only the degree of nodes as a null model which is equivalent to rewiring the network whilst preserving the degree sequence [38,39].This random model overlooks the spatial nature of the network; thus, modules are blind to spatial anomalies and fails to uncover modules determined by factors other than mere physical proximity [19], which is the reason why several distance-dependent null models have been proposed recently [19,37,40,41].
Our goal is to use the tools of network community detection to evaluate the attractiveness of the elements of settlement hierarchies (towns, statistical subregions, counties, and regions) based on their modularities as well as internal and external connection densities.We study the internal connections of the ownership network through the point of view of Newman-Girvan, spatial and gravity-based null models.As the modularity is based on the difference between the actual and evaluated values of weight of edges, the real spatial network more accurately describes the null model, and the total modularity tends to be zero, so the modules highlight the hidden structural similarities.We developed a visualization technique to analyze these unknown effects on community structure which can explain the attractiveness of a settlement/region.Besides measuring the attractiveness, we utilize the Louvain community detection algorithm [42,43] to identify closely related regions.We examine the complete investment network of Hungarian companies to explore how the ownership connections are geographically distributed, what is the structure of the network, and what are the common connection directions, as well as how the extracted information is correlated to the settlement hierarchy.The studied database contains information about the owners and addresses of the companies.The results highlight the fact that distance dependence of the investment connections is more significant than was found in online social networks [22,26,44].The analysis shows that the network is hierarchical and modular as well as shaped according to the settlement hierarchy, in which Budapest is the absolute center, and the centers of counties function as hubs.
The outline of this paper is as follows: Section 2.1 presents the company-ownership network.The metrics related to attractiveness are given in the Appendix.Section 2.2 describes the null models designed by us to measure modularity as well as handling physical proximity and presents how closely related regions can be explored based on the modularityrelated merging of towns and subregions.The results and discussion are provided in Section 3.

Problem Formulation: Settlement
Hierarchy and Community Structure in Personal Investment Patterns

Network Representation of Personal Investment Patterns.
The proposed methodology is based on the analysis of a directed investment network represented by an asymmetric biadjacency matrix A [,] , whose elements are defined as 1 if the -th person owns the -th company 0 otherwise. ( As the addresses of the owners and their companies are known, connections between companies and their owners define ties between geographic locations. According to the levels of the settlement hierarchy, a fourlevel study can be defined to describe how towns, regions, or counties are connected through company ownerships (see Figure 1).Although companies also own shares in other companies, as we intended to study the attractiveness of economic regions based on personal investment decisions, we examined only companies that belong to individuals.The levels of the settlement hierarchy [] are defined based on the nomenclature of territorial units for statistics classification (NUTS) and the two levels of local administrative units (LAUs): (Please note that, for simplicity, the term "town" is used for all cities and villages.)People and their companies are assigned to geographic regions by the A [,] and A [,] incidence matrices, whose elements are defined as follows: (i)  [,]   , with element one if the headquarter of the -th company is situated in the -th geographic region at the level  of the settlement hierarchy, (ii)  [,] , with element one if the -th person is situated in the -th geographic region at the level  of the settlement hierarchy, so the directed weighted network that defines the number of investment connections between the regions can be defined as Although companies may have many local divisions, the links between the towns are defined only by connecting the permanent addresses of the owners and the location of the headquarter.This arrangement results in a transparent and easily interpretable network as people and companies are assigned to only one location.The resultant network describes how investments unite the locations; e.g., the adjacency matrix A [1] defines the number of links between the towns, and the degrees of the nodes represent the number of incoming and outgoing investments to the -th and from the -th town, respectively: The total number of ownership relationships is equal to the sum of the edge weights of the networks: where  and  represent the indices of the geographic regions at the level  of the settlement hierarchy.

Complexity
It should be noted that as  represents the total number of connections, its value is independent of at which hierarchy level the edge weights are summarised.
Similarly, the total number of companies and investors can be calculated by summing the number of companies and people at any hierarchy level, respectively: where  represents the index of the geographic regions at the level  of the settlement hierarchy.
As people and companies are assigned only to one geographical region with the A [,] and A [,] incidence matrices, the number of people and companies at the -th region of the []-th level of the settlement hierarchy can be calculated as The number of internal and external links of the network and the analysis of the local densities can be used to measure the attractiveness of the regions (see the Appendix).The following main body of the paper focuses on models that can be used to explore the communities in the network.

Evaluation of the Community Structure in the Settlement Hierarchy.
The key idea of the methodology is that geographical regions can be interpreted as nonoverlapping communities of investors and companies as they belong to exactly one region among the set of these regions on the -th level of the hierarchy,  [] = { []  1 ,  []  2 , . . .,  []   , . . .,  []   , }.From the view of a community, the external degree is the number of links that connect the -th community to the rest of the network, while the internal degree is the number of links between companies and owners in the same community, in other words, at the same location at the th level of the hierarchy (for more details see Appendix A).Recently, a wide variety of () metrics have been proposed to evaluate the quality of communities on the basis of the connectivity of their nodes [37].The following subsections will demonstrate how these metrics can be interpreted to evaluate the attractiveness of geographical regions.

Modularity of a Region and Level of a Settlement Hierarchy.
Classical modularity optimization-based community detection methods utilize () metrics that are based on the difference between the internal number of edges and their expected number [39,45]:  () = (fraction of edges within communities) − (expected fraction of such edges) .(10) In the case of the proposed directed network, this difference can be formulated as ( [1]  , −  [1]  , )  ( where  [1]  , represents the number of estimated investments proceeding from the -th to the -th town and ( []   ,  []   ) is the Kronecker delta function that is equal to one, if the -th and -th towns are assigned to the same region on the -th level of the hierarchy (e.g., ( [2]   ,  [2]   ) = 1 when towns A and B are situated in the same statistical subregion).
The value of the modularity  []   of a cluster/region  []    can be positive, negative, or zero.Should it be equal to zero, the community has as many links as the null model predicts.
When the modularity is positive, then the  []   subgraph tends to be a community that exhibits a stronger degree of internal cohesion than the model predicts.
Using the proposed matrix representation, the calculation of the internal links at a given level of the hierarchy is straightforward, so the modularity can be easily calculated based on the diagonal elements of the adjacency matrices of the network and its null model: where  []  , represents the number of internal links in the -th community/region on the -th hierarchy level while  []   , is the expected number of these internal links calculated by the null model.

Null Models for Representing Regional Attractiveness.
The critical element of the methodology is how the  [1]  , connection probabilities of the towns are calculated.The most widely applied null model is the random configuration model which calculates the edge probabilities assuming a random graph conditioned to preserve the degree sequence of the original network: [1]  , =  [1,] This randomized null model is inaccurate in most realworld networks [41].
As we measure the attractiveness of the regions based on the probability of link formation, it is beneficial to utilize attractiveness-related variables in the model as well as taking the distance-dependent link structure into account.Firstly, we generalize the model by defining the node importance measures    and    : [1]  , =       .
As is expected from the null model, to fulfill the following equality, ∑ ,  [1]  , = ∑ , [1]  , = , the importance measures are normalized as ∑     = 1 and ∑     = 1: where the parameters ,  > 0 reflect the importance of the   and   variables used to express the probability of forming an edge from the -th to the -th node.Please note that when  = 1 and  = 1,   =  [1,]    ,   =  [1,]    , and  = , the model is identical to the random configuration model of a weighted directed graph.
To model the probability of distance-dependent link formation, the model defined by ( 15) is extended by a deterrence function ( , ) which describes the effect of space [20]: [1]  , =        ( , ) .
The function ( , ) can be directly measured from the data by a binning procedure similar to that used in [19]: whose function is proportional to the weighted average of probability (1/)  [1]  , /(      ) of a link existing at distance .When the distance dependence of the connection probability is handled by an explicit function, various modifications of the gravity law-based configuration model can be defined: [34,46], () = exp(− , /) [47], or () =  − , exp(− , /) [48].
To ensure that the sum of the expected number of links is equal to  (see (16)), in this distance-dependent model  should be normalized as Several models can be defined based on what kind of indicators are selected in the model.When the nodes are considered to be equally important, in other words,   =   = 1, only the distance determine the link formation probability, ( , ).The importance of the nodes can be interpreted as the number of investors and companies, so   = ( )  .Finally, socioeconomic indicators, like the number of inhabitants, or their complex combinations can be utilized.
When () = 1/  , , the parameters , ,  can be estimated as a regression problem.The identified parameters indicate the sensitivity, i.e., importance, of the variables that can be sorted by their importance as suggested in classical gravity law-based studies, like in [20].

Economic Relations of the Regions.
Connections that interlink communities indicate their relationships and possibilities to merge modules/regions that are strongly connected.We combine regions and determine the gain of the merged modularity in a similar way to the Louvain community detection algorithm [42].The Δ , modularity change obtained by merging the -th and -th communities can be calculated as the difference between the actual and predicted number of interlinking nodes: The resultant symmetric modularity gain matrix can be calculated as where B [] = A [] − P [] is the so-called modularity matrix [38].
The Louvain algorithm moves a node  in the community for which the gain in modularity is the largest.If no positive gain occurs,  remains in its original community.After merging the nodes/regions, a new network is constructed whose nodes are in the communities identified earlier.This method can be used to explore regions (modules) formed by the elements of the -th settlement hierarchy with different null models.Although model-based communities can be identified by this approach and compared to regions of a larger hierarchy level as modules of ground truth, the main goal of the analysis of M [] is to measure the strength of relationships between the regions.
The following section demonstrates the applicability of the previously presented toolset in the analysis of the network of Hungarian companies.   of the towns and regions as the generated network covers more than 90% of the investment-type connections.

Results and Discussion
The owners and companies were assigned to settlements, and the related settlement hierarchy covers 3,155 towns (level LAU 2, formally level NUTS 5), 175 statistical subregions (level LAU 1, formally level NUTS 4), 20 small regions/ counties in level NUTS 3, and 7 regions in level NUTS 2.
74% of the connections remain within the borders of the towns, which also reflects the high degree of modularity of the network (for more details, see Table 1).302, 781 connections are within Budapest and 45, 559 connections point out of the city, while 89, 944 connections point into the capital.The map of the regional connections between the people and companies can be generated using the obtained connectivity matrix and the latitudes and longitudes of the towns (see Figure 2).It can be seen that the network reveals a hierarchical and modular structure reflecting that the Hungarian economy is concentrated around the capitals of the counties and Budapest, the capital of the country.The majority of the companies are situated in these locations; consequently, the network follows the structure of online social networks [44]; in other words, it is also structured according to the settlement hierarchy, in which Budapest is the absolute center of the network and the centers of counties also function as hubs.company-ownership network.As shown in Figure 3, these densities are significantly higher in most subregions and a negative correlation exists between the size of the regions and the number of their inner connections ( = 0.298,  < 10 −4 ).As illustrated by the results, smaller locations are much more isolated than larger ones, like Budapest.The same result is obtained by the analysis of the external density-based opennes measure which we consider as a main measure of attractiveness (see Appendix A for more details).As shown in Figure 4, bigger regions exhibit lager openness values reflecting their higher degree of attractiveness ( = 0.94,  < 10 −10 ).

The Effect of Geographical Distance.
To address the effect of distance decay on link formation, the observed ties between the towns were compared with their expected number calculated from a probabilistic model.
A resolution of 10 km was used for binning the distance distribution (see Figure 5).The exponent of distance decay according to our data is -1.1057.It should be noted that the effect of the capital city is so high, the probability of forming connections with Budapest is slightly less distancedependent, and the exponent of distance decay with regard to these connections is only -0.6385.
The distance-dependent link formation probability can be explained by the notion that the costs of establishing and maintaining the connections are also distance-dependent.This assumption can be confirmed by the fact that the distance has a much stronger effect on investment ties than on online social networks in Hungary (where the exponent of distance decay is -0.6) [44], probably since the cost of keeping connections is less dependent on distance than the management of a company far from the permanent address of the owner.Five sets of    ,    variables were defined, including simple metrics like the numbers of nodes and edges [1] in addition to socioeconomic variables, like the number of inhabitants and Total Domestic Income (total income received by all sectors of the economy including the sum of all wages, profits, and taxes, minus subsidies).Based on the combination of different variables and distance functions, 15 different models were identified: min ,,   (, , ) = 1       A [1] − P [1]     2 .

Comparison of the
As summarized in Table 2, by taking the distance into account, the accuracy of the model is significantly improved.Among distance-dependent models, the gravity models perform best (in comparison, the accuracy of the distance independent random configuration model is 0.16494).The Total Domestic Income (TDI) is one of the best indicators.The identified , , and  parameters reflect the importance of the    ,    , and  variables in the models (e.g., in the case where    =   and    =   , the resultant nonlinear regression model is  , = 0.12 ⋅ ((   ) 0.37 ⋅ (   ) 0.81 )/ 1.58 (see Table 3)), which can be interpreted as the notion that the number of connections between location  and location  is increased by 0.37% as a result of 1.0% growth of TDI in location .Similarly, the number of connections between location  and location  is increased by 0.81% as a result of 1.0% growth of TDI in location .According to the gravity-type models, the importance of the target/destination Complexity    [1]  , and their estimated values  [1]  , generated by different null models on the town level (LAU 2) settlement hierarchy when    =  [1,]    and    =  [1,]    .The + symbols represent the inner connections that form a separate cluster.This plot directly reflects the goodness of fit as the model estimates the connections of the towns.locations () is greater than the importance of the sources () regardless of how the strengths of the nodes are interpreted.

Evaluation of the Modularities.
As modularity-based community detection evaluates the set of  [1]  , >  [1]  , edges (and the related nodes) whose weights are underestimated by the null model (see (11)), we designed a plot that compares  [1]  , with  [1]  , to highlight the set of potential edges that can be used to form communities.
Four null models based on the  1 fl  [1,]    and  2 fl  [1,]    Newman and Girvan model are compared in Figure 6.In all models, the inner connections (represented by +) form a separate cluster which confirms that 74% of the connections remain within the borders of the towns.The first model (  ) shows that more inner connections exist than would be expected based on the random configuration network.The spatial models   and  , handle the dependence on distance of the connections, so a slightly smaller difference is shown in the number of the experienced and expected inner connections.It is reflected in Figure 7 that during the aggregation procedure the qualitative behavior of the models does not change.
The difference between the expected number of interconnections is higher in the case of smaller settlements which indicates that small regions are not as attractive as would be expected from their number of nodes.The gravity model  V well estimates the inner connections thanks to the exponents  = 0.35652 and  = 0.69045 whose parameters effectively represent that the increase in the number of connections affects the attractiveness in a nonlinear fashion.This phenomenon is much more interesting when the utilized variables can be interpreted as economic potentials.When TDI is applied in the gravity model,  = 0.37367 and  = 0.81425.These values and Figure 8 confirm that gravitybased models behave similarly and, therefore, reflect the same mechanism of attractiveness.

Forming Communities.
Connections that interlink communities are indicative of their relationships.The effect of these interlinks can be studied by the change in modularity (see (21)) expressed as ΔM [] = (B [] )  + B [] .
To determine the community structure, the MATLAB implementation [49] of the greedy Louvain algorithm [50] was used.Towns and subregions were used as an initial Comparison between the number of the edge weights  [2]  , and their estimated values  [2]  , generated by different null models at level LAU 1 of the settlement hierarchy when    =  [1,]    and    =  [1,]    .The + symbols represent the inner connections that form a separate cluster.This plot reflects that, during the aggregation procedure, the qualitative behavior of the models does not change; furthermore, the same phenomena can be observed as in Figure 6.Comparison between the number of the edge weights  [2]   , and their estimated values  [2]  , generated by the gravity null model at level LAU 1 of the settlement hierarchy when    =   and    =   .The + symbols represent the inner connections that form a separate cluster.
partition B [] .As shown in Figure 9, the community structure formed based on the null model   almost perfectly reconstructs the counties confirming that the settlement structure is reflected in terms of the personal investments.Different null models provide different viewpoints with regard to community detection.The NG null model does not handle the distance dependence of the connections so the matrix B [] = A [] − P [] of the modeling errors reflects the distance dependence of the connections.Therefore, the resulting communities form spatial clusters.On the contrary, communities formed by the gravitational models reflect distance-dependent differences less.According to the resultant maps, the attractiveness of Budapest is highlighted as only small since closed regions were not assigned to the module of the capital (see Figure 10(a)).It is interesting to note that all the centers of counties were assigned to the community of Budapest in gravitational model which also confirms the hierarchical structure of the network.To highlight the hierarchical structure and increase the sensitivity of the model, a resolution parameter was introduced into the model (see Appendix B) that can be adapted to detect similar region-pairs as shown in Figure 10(b).
Communities formed with the NG null model (see Figure 9) and the TDI-based gravity models (see Figure 10) significantly differ.The interpretation of the communities and these differences should rely on the understanding of the concept of the modularity.The utilised modularity detection algorithm generates partitions in which the links are more abundant within communities than would be expected from the employed model.
As the NG null model only uses the basic structural information encoded in the adjacency matrix, when the probabilities of the connections are dependent on distance, the resulting communities will represent closer geographical regions.As Table 1 and Figures 6 and 7 show, most of the connections remain within the county borders, so it is natural that the resultant 30 communities are almost identical to the counties.
Since the Hungarian road network reflects the administrative regions, it can be shown that the distance strongly affects the probability of the connections.This distance dependence of the connection probability can be incorporated into the null model by the proposed gravity model.In this case, the resultant communities will reflect another unmodelled surplus in the number of connections.When the attractiveness and the distances are considered in the null model, the communities will reflect the additional economic attractiveness/similarity of the regions.
As Figure 10 shows, the algorithm generates a huge cluster of a well developed regions with Budapest, the larger cities and county seats with high TDIs, and several small communities related to isolated and less developed subregions.

Conclusions
Regional policy-making and monitoring are firm-centered, incentive-based, and state-driven.Personal investments define ties between geographical locations.We analyzed the structure of this ownership network and proposed a methodology to characterize regional attractiveness based on a set of null models identified to approximate the probabilities of link formation.According to the levels of the settlement hierarchy, a four-level study was conducted.
Based on the calculation of the internal and external network densities, several measures were proposed to evaluate the attractiveness and development of towns and geographical regions.The results indicate that small and less competitive regions have less internal connections, while larger cities are much more open.
To provide a more in-depth insight into the network, the dependence of link formation on distance was studied.The probability of connections between owners and their companies shows a much more rapid degree of distance decay than experienced in social networks.The attractiveness of the capital is so high that its connections are much less dependent on distance than other cities.
Based on the combination of three deterrence models and five sets of indicators, 15 different null models were identified besides the classical Newman-Girvan random configuration model.Communities statistically have more significant edge weights that would be wired according to the null model.As it was highlighted that underestimated link probabilities are the sources of modularity, a scatter plot was designed to visualize how the null model approximates the real structure of the network.
The identification of gravity-type models highlighted that link formation is nonlinearly dependent on the studied variables.Furthermore, the target nodes are much more important when determining the probability of link formation than the source nodes which also confirms why the structural analysis of company-ownership networks can be used to measure regional attractiveness.
We applied the Louvain community detection algorithm to form clusters of cities and subregions and compared the resultant communities to administrative regions.When the null model more closely approximates the real structure of the network, then the modularity is expected to be lower.As community detection forms modules whose internal link densities are significantly higher than what would be expected from the applied null models, spatial clusters that were highlighted by the distance independent random configuration model are almost identical to the counties.Communities generated based on the gravitational models, which correctly estimate the number of internal nodes and the dependence of link formation on distance, exploited the attractiveness of the capital, as they form a massive cluster that includes most of the centers of each county, bigger cities, and the competitive touristic regions, while the remaining small clusters reflect isolated regions that are less developed and less attractive.

B. Improvement of the Resolution
The modularity always increases when small communities are assigned to one group [52].Modularity optimization with the null model   has a resolution threshold which means it fails to identify small communities in large networks and communities consisting of less than (√/2-1) internal links [53].Reichardt and Bornholdt (RB) generalized the modularity function by introducing an adjustable   parameter [54,55] to handle this problem, which for our directed and weighted networks is Arenas, Fernandez, and Gomez (AFG) also proposed a multiresolution method by adding  self-loops to each node [56].This algorithm increases the strength of a node without altering the topological characteristics of the original network, as A  = A +  I, where I denotes the identity matrix and  the weight of the self-loops of each node: where   = +,  = ∑ ,  , ,    =    +,    =    +, and These methods still have the intrinsic limitation, so large communities may have been split before small communities became visible.The theoretical results indicated that this limitation depends on the degree of interconnectedness of small communities and the difference between the sizes of the communities, while being independent of the size of the whole network [52].
It should be noted that the modularity decreases when  , more closely approximates the real  , values which is equivalent to finding the null model that most closely fits.

C. Network Topology Analysis
The degree distribution was determined in all levels of the settlement hierarchy by following the methodology presented in [13].Figure 11 shows that the distribution shows smalldegree saturation and high-degree cutoff.Several distribution functions were fitted.The two-sided Voung's test statistic [57] showed that exponential and Poisson distributions which reflect the randomness of connections could be rejected.According to this test, the power-law distribution cannot be rejected.The estimated parameters are shown in Table 4.The power-law distribution of the incoming and outgoing connections reflects the preferential attachment-type structure of the network.
In hierarchical networks, nodes with high degree tend to connect to nodes that are less connected to others [58].Therefore, the hierarchical structure of the network is reflected by the dependence of the local clustering coefficient () on the degree of the nodes.As Figure 12 shows, () decreases with increasing  with () ≈  −0.3 which indicates the hierarchical structure of the network [58,59]. [] :

D. Notations
Aggregation of an  at level  of the settlement hierarchy A [,] : Biadjacency matrix of person-company ownership network  [,] ,

:
An element (edge weight) of the A [,]  biadjacency matrix of person-company ownership network A [,] , A [,] : Incidence matrices of person-location and company-location bipartite networks at the level  of the settlement hierarchy A [] : Simpler notation of an adjacency matrix of location network at  level of settlement hierarchy (see (3))   : Numbers of companies and people in the -th region at level  of the settlement hierarchy  [] ,  [] : Number of companies and people/owners/investors in the network : Number of links in the network : Set of communities (each node is a member of exactly one community)  [] : Set of communities at level  of the settlement hierarchy ( 1 denotes the set of towns)  []   : N u m b e ro fc o m m u n i t i e sa tl e v e l of the settlement hierarchy (): Generally a metric as a function of community structure that indicates the goodness-of-fit of the community on the basis of the connectivity of nodes in it ( [] ): Metric of the goodness-of-fit of the community structure which is the level  of the settlement hierarchy : As p e c i a l() defined by (11) called modularity of network   : M o d u l a r i t yo fc o m m u n i t y (sum of the modularity of each community yields the modularity  of the network)  [,]    ,  [,]    : Internal and external densities of the -th community at level  of the settlement hierarchy, defined by (A.2) and (A.3)  []   : Openness of the -th community at level  of the settlement hierarchy, defined by (A.4)  []   : Expansion of the -th community at level  of the settlement hierarchy, defined by (A.5)  []   : Link-collection ability of -th community at level  of the settlement hierarchy, defined by (A.6)  []   : C u tr a t i oo ft h e-th community at level  of the settlement hierarchy, defined by (A.7).

[
,]  )  and   = ( [,]  )  .The null model can be defined based on the random configuration model, which results in the selection of the variables as   = ( [,]  )  and   = ( [,]

3. 1 .
Description of the Studied Dataset.The studied dataset represents  = 1,077,090 ownership relations between  [] = 531,249 people and  [] = 868,591 Hungarian companies in 2013.It should be noted that only less than 10% of the ownership connections are defined based on how companies possess shares in other companies, so, although only personal investments are studied, the results reflect the attractiveness 6 Complexity

Figure 2 :
Figure 2: Map of the town-level company-ownership network.Edges with more than 10 ownership connections are shown.Edges connected to the capital (Budapest) are denoted by green lines.

Figure 3 :
Figure 3: Network density as a function of the number of inhabitants on the level LAU 1.

Figure 4 :
Figure 4: Openness of small regions (LAU 1 level) as a function of the number of their inhabitants.

Figure 6 :
Figure 6: Comparison between the number of the edge weights [1]  , and their estimated values [1]  , generated by different null models on the town level (LAU 2) settlement hierarchy when Figure7: Comparison between the number of the edge weights [2]  , and their estimated values [2]  , generated by different null models at level LAU 1 of the settlement hierarchy when Figure 8:  Comparison between the number of the edge weights [2]

Figure 9 :
Figure 9: Communities formed by the Louvain method and Newman-Girvan (NG) null model (  =    and   =    ) reflect the settlement hierarchy as the resultant communities are almost identical to the counties.

Figure 10 :
Figure 10: Communities formed by the Louvain method and gravitational null models reflect the attractiveness of Budapest as only less developed closed regions were not assigned to the module of the capital.

Figure 11 :
Figure 11: Distribution of the  [1,]  edges at the LAU 2 settlement hierarchy level.

3 Figure 12 :
Figure 12: Local clustering coefficient as a function of the  [1,]  node degrees.
[,]  : I n -d e g r e eo ft h e-th node (geographic region) at level  of the settlement hierarchy  [,]  : Out-degreeofthe-th node (geographic region) at level  of the settlement hierarchy  [,]  ,  [,]

Table 1 :
Number of edges inside the settlement hierarchies.

Table 2 :
Performances of distance-dependent null models.
(19) Models.Based on the utilized distance function, three different types of models can be defined.When () is a deterrence function defined by(19), the models are denoted as   =       (). , = (   )  (   )  () represents the parametric version of this model, when the exponents  and  are optimized to achieve a more accurate approximation of connections between towns.

Table 4 :
Parameters of the power-law distributions fitted to networks at different settlement hierarchy levels.