On the Emergence of Islands in Complex Networks

1 Instituto de Investigación en Comunicación Óptica (IICO), Universidad Autónoma de San Luis Potośı (UASLP), San Luis Potośı, SLP, Mexico 2Facultad de Ingenieŕıa, Universidad Autónoma de San Luis Potośı (UASLP), San Luis Potośı, SLP, Mexico 3Facultad de Ciencias, Universidad Autónoma de San Luis Potośı (UASLP), San Luis Potośı, SLP, Mexico 4Universidad Politécnica de San Luis Potośı (UPSLP), San Luis Potośı, SLP, Mexico


Introduction
Research in complex networks (CN) has risen in interest and importance given that many natural and artificial systems can be abstracted, modeled, and analyzed using this type of networks.Examples of such systems are numerous: neuron connectivity [1], the plant pollination process [2], gene inheritance [3], metabolic interactions [4,5], highway and road networks [6], emails [7], sexual partners [8], and many others.
Before 1998, most networks were studied using the random network model, which assumes that each node randomly chooses other nodes to get connected.This random selection process produces certain topological properties [9].For example, both the in-degree and out-degree distributions may approximate either a Poisson or an exponential function [9].
Random models were employed before real data became available which allowed to verify the properties and characteristics obtained from this model.Still, it was difficult to emulate the behavior of real networks because many of the factors needed to perform these simulations were unknown.It is under this context that Paxon and Floyd published an article where they described the main difficulties in simulating the Internet [10].
A new insight was provided by Redner in 1998, when he published a study about the distribution of citations in scientific publications [11].In his study, publications are described as a network, where an article is represented as a node and the citations between papers are represented as a network edge.Redner discovered that the tail of the citation distribution decays as a power-law with an exponent  = 3.
In 1999, Faloutsos et al. published important topological properties occurring in the Internet [12].Among the most interesting ones is the conclusion that the Internet's outdegree and in-degree distributions at the autonomous system scale follow a power-law distribution [12].In that same year, Adamic et al. showed that the WWW also follows a powerlaw distribution in some of its topological properties [13].
2 Complexity These studies suggest that these networks cannot be analyzed using the random model, since some of their connectivity properties do not behave as a Poisson or as an exponential function but rather as a power-law function.
It is important to notice that although these networks are fundamentally different they still have similar properties.For example, even when the Internet has doubled its number of nodes, its diameter has not changed significantly.In this context, diameter is the number of nodes that need to be visited for the longest of all the shortest paths between all the pairwise nodes in the network.
These investigations have been the watershed that initiated the search of power-law behaviors and other properties in systems that can be modeled as a CN.In a few years this search has led to discoveries which have confirmed that power-law distributions and their properties, like clustering and diameter, can be found in most of these complex systems [14].
In 1998, Watts and Strogatz published a study about CN where they affirmed that real networks are in an intermediate state between randomness and regularity.They named this type of systems as small-world networks because of their small diameter, and they also introduced a mechanism to produce them.However, there was no knowledge about the internal processes which were able to reproduce these properties and thus there was no growth model which would allow to study them.At this point, Barabási and Albert published a study [15] where they introduced a growth model for CN based on a node aggregation process with preferential attachment.This model yields networks with a degree distribution that decays as a power-law with an exponent equal to 3. Such a model, however, has some limitations: it only generates networks with a fixed degree distribution exponent.This result contrasts with real networks obeying a power-law out-degree and in-degree distributions, which both exhibit exponents  ̸ = 3.Therefore, a new effort started the search for new processes that could be included into the Albert-Barabási model in order to reproduce the same topological properties found in real CN.Some of these local processes have already been studied.For example, Krapivsky et al. [16] implemented a nonlinear preferential attachment growth mechanism.Dorogovtsev and Mendes [17] proposed another model in which nodes get an initial attractiveness at the time of being born.This is how they solved a contradiction in Barabási's original model for directed networks which could be illustrated by the following example: in the first day of term in a classroom nobody knows each other.Following Barabási's model, the probability that a student obtains a new friend is proportional to the number of friends he or she already has.Since nobody has friends, no new friendships are created, which is not what happens in a real situation.
Another local process studied is the one introduced by Dorogovtsev and Mendes [18] who considered that nodes may become old and decrease their attractiveness.For example, scientific articles' citations decrease with time.Albert and Barabási introduced a model with rewiring [19], in which there is a probability  that some node  deletes link   to a node  and then uses preferential attachment to a node   using a new link    .This process may be found in some networks, like the WWW when a web master deletes a hyperlink and adds a new one.Esquivel-Gómez et al. [20] described a model which prohibits multiple links between the same nodes.Acosta-Elias et al. [21] studied the impact of delays, which is based on the fact that each new node selects to which other node it should get connected using local information, instead of global topological information.In other words, delays allow for a behavior in which new nodes select new connections based on a partial view of the system.
There are many other models, processes, and properties that have been studied about CN.Many of these may be consulted in various review articles [14,22,23].However, after sixteen years of different studies and analyses of CN, it seems that there is a lack of consensus and knowledge about the fundamental laws that govern these systems.
Using the percolation concept, there have been analytical studies about the properties of islands or connected components in random and CN [24].Moreover, islands have been observed in real networks: Broder et al. [25] studied a WWW sample with approximately 200 million web pages with 1.5 billion of links and discovered islands with sizes, expressed as the number of pages, exhibiting a power-law distribution with an exponent equal to 2.54.This power-law distribution of island sizes allows the existence of giant islands (giant connected components).Determining the probability that any given node belongs to these giant islands is important because it allows to calculate the probability that a message generated by a node reaches another node or to model strategies that could stop the transmission of an epidemic disease.As a matter of fact, in the literature it is possible to find growth models capable of generating CN with community structure [26][27][28][29][30]; that is, networks containing groups of nodes strongly connected between them but weakly connected with the other nodes comprising the network.Likewise, there exist growth models [20,[31][32][33][34][35] that consider every node added to the network always connecting to the existing ones.In other words, all the nodes in these models form a single island, which contains all the nodes of the network.However, in some real networks, as in the US patent's citation network [36], the nodes form more than one island and hence   follows a power-law distribution   ∼  − [37].
A possible cause for the origination of islands in some real CN is that, during network growth, some nodes may be born with zero out-going links (i.e., patents without references to other patents) and this causes new islands to be generated.
This paper proposes a growth model for directed CN that considers the emergence of islands as the network grows.In the proposed model new islands are created with probability Φ, contemplating two different cases: (1) Φ remains constant during the whole life of the network and (2) Φ decreases as the network grows.It is shown analytically and validated through numerical simulation that, for the first case, the model generates directed CN with power-law behavior in its in-degree (  ∼  − ) and island size (  ∼  with  and  both tunable from 2 to ∞.For the second case of Φ the exponents are fixed to  = 2 and  = 1, respectively.The paper is organized as follows.Section 2 outlines the features of the model proposed in this paper.The analytical solution of the model and the experiment details and results are present in Section 3. Finally, the discussion and conclusion are given in Sections 4 and 5, respectively.

Model
We consider that the birth of new islands is governed by a probability Φ and by taking into account two cases: (i) Φ = 1/, where  is the number of nodes in the network.In this case, the probability that a new island is born decreases as the number of nodes in the network increases.This idea is mapped from real networks as follows: in a scientific papers citation network, when there are few papers (nodes), it is more probable that a new paper does not cite other papers (generating a new island) because it addresses an entirely new scientific topic.Conversely, when the quantity of papers increases, the probability that a new paper addresses an entirely new theme decreases; thus the probability of generating a new island also decreases.(ii) Φ = , 0 <  < 1.In this case, the probability that a new island is created remains constant during the whole life of the network.
Furthermore, the growth of the network is performed by adding one node at each time step.At the beginning, only node  0 exists in the network and for each new node  new added to the network, either one of the following rules is performed: (i) With probability Φ,  new does not connect to any node in the network.That is,  new generates a new island (see Figure 1).
(ii) With complementary probability 1 − Φ,  new randomly selects a node   and connects to it, as well as to all nodes that have one incoming link from   (see Figure 1).

Analytic Solution of the Model
3.1.Islands Size Distribution.In order to obtain the analytical solution for   , the continuum method [38] was employed using the following differential equation: Equation ( 1) describes the variation of the number of islands   with  nodes with respect to the total number  of nodes in the network.Term  1 describes the birth of a new island; that is, it models the situation that a new node  new does not connect with any node (first rule of this model).The term  2 depicts the second rule of the model, term   2 accounts for the situation that a new node  new randomly selects a node   belonging to an island with  − 1 nodes and connects to it, and thus   () increases.The term   2 describes the situation that a new node  new randomly selects a node   belonging to an island with  nodes and connects to it, and thus   () decreases.

Complexity
Equation (1) may also be written in the standard form for a linear differential equation: In order to investigate the impact that Φ = 1/ and Φ =  have in   , (2) is solved for each one of them.With Φ = 1/, (2) takes the following form: In order to obtain   (), (3) is solved for  = 1,  = 2, and so on.For  = 1, (3) takes the following form: and solving (4) gives where  is a constant and   (⋅) is the exponential integral function.As  ≫ 1, (5) yields  1 () ≈ 1. Solving (3) for the following  values produces the following: From the previous result (see (6)) it is possible to deduce that That is, with Φ = 1/ the proposed model is able to produce island size distributions with a power-law behavior   ∼  − with fixed exponent  = 1.

In-Degree Distribution.
In order to obtain the analytical solution for the in-degree distribution generated with the proposed model, the continuum method is used [38].Hence, the differential equation that describes the in-degree distribution may be written as follows: Figure 2: A network comprising three nodes ( 0 ,  1 ,  2 ).In this network, the in-neighbors of  0 are  1 and  2 .Also the number of nodes with three incoming links is  3 = 0.There are two possible ways to increase  3 : (1) a new node  3 randomly selects node  0 and connects to it (  1 in the figure and ( 15)); thus  3 = 1; (2) a new node  3 randomly selects an in-neighbor of  0 and connects to it (solid line) and to  0 (dashed line), as stated by   1 in ( 15) and this figure.
Equation (15)  Finally, the terms  3 and  4 model the effect of adding a new node with zero incoming links using the second and the first rule of the model.Equation ( 15) may be written in the standard form for a linear differential equation: In order to analyze the impact that Φ = 1/ and Φ =  have in   , ( 16) is solved for each one of them.For Φ = 1/, (16) takes the following form: Solving (17) for some  values it is possible to deduce that That is, with Φ = 1/ the proposed model is able to produce in-degree distributions with a power-law behavior   ∼  − for 1 <  <  with fixed exponent  = 2.The same result was previously obtained by Krapivsky and Redner [32] in a similar model without contemplating the emergence of islands during the network growth.

Experiment Details and Results
. In order to validate the analytical predictions for   (see (7) and ( 13)), four numerical simulations were performed.For each simulation, the growth of a directed network from  = 1 to 10 4 nodes was realized taking into account the proposed model developed above.Figure 3 shows that   produced by our simulations and by the analytical results fit appropriately.
In order to validate the analytical predictions for   (18) and (20), four numerical simulations were performed.For each simulation, we considered the growth of a directed network from  = 1 to 10 4 nodes.Figure 4 shows the comparison of   produced by the simulations and the analytical results, showing that both fit appropriately.
Figure 5 shows the experimental out-degree distribution  out .It can be seen that when  = 1/, the model produces networks with  out following a Poisson distribution (Figure 5(a)).On the other hand, when  = ,  out decays  7).(b) Equation ( 13) with  = 0.1, 0.5, and 0.7.as an exponential function as  → 1, whereas it exhibits a combination of an exponential and Poisson distribution when  → 0 (see Figure 5(b)).

Numerical Study of 𝑄
and   in the Islands.In order to investigate the in-degree and out-degree distributions occurring in the islands generated with our model, several numerical simulations were also performed.The numerical simulations consisted in simulating the growth of a network from 1 to 10 4 nodes with  = 1/ and  =  (with  = 0.1, 0.5, 0.7) and three islands of different sizes were randomly selected.
Figures 6(a)-6(d) show that the in-degree distributions of the islands are all following a power-law with exponent  = 2.The only difference is the scale among them owing to the size of each island.More precisely, all the in-degree distributions are consistent with (18) and that obtained by Krapivsky and Redner [32].The above result is easily interpreted since within each island, the in-degree distribution is governed only by the copy process, originating from the power-law with exponent  = 2. Similar behavior for the out-degree distribution is observed (Figure 7).

Discussion
Importantly, in this model the case when Φ increases as the number of nodes increases is not considered.This is because when  is large enough, new nodes added to the network would have a high probability of not connecting to other nodes, thus generating new islands.Therefore, the resulting network would be composed of a big quantity of isolated nodes.Also, the situation that a new node can connect to nodes present in different islands is not considered, because this would result in the fusion of two or more islands.These cases will be included in a future work.
Also, the proposed model produces networks with outdegree distributions that follow exponential and Poisson distributions.This contrasts with the out-degree distributions of several real networks in which the out-degree distribution follows a power-law.This is a limitation of the proposed model and could be a future topic of study.
Finally, it should be noted that the result obtained in ( 14) is similar to the model proposed by Simon [39], which was developed to explain the distribution of words in prose samples by their frequency of occurrence.It is remarkable that such different behaviors produce similar mathematical results.

Conclusions
In summary, large complex networks such as the US patents citation network [36] have more than one island and its island size distribution   follows a power-law   ∼  − [37].In order to reproduce this behavior, we have introduced a growth model for complex networks that considers the creation of islands during the growth of the network.In this model two cases are considered: namely, the creation of islands is more frequent in the networks early stages of growth and the probability of new islands remains constant as the network grows.When the first case is used, the network generated have   and   with power-law behavior with scaling exponents  = 1 and  = 2, respectively.When the second case is used, the network generated exhibits   and   with power-law behavior with scaling exponent, both ranging from 2 to ∞.

Figure 7 :
Figure 7: Out-degree distributions of islands with different sizes founded in the networks generated with the proposed model with  = 1/ (a) and  = 0.1, 0.5, and 0.7 (b-d).
Growth of a directed network using the proposed model.Φ describes the probability for a new island emerges.At the beginning ( 0 ), only node  0 exists in the network.At the next time step ( 1 ) node  1 is added to the network and it is assumed that it connects to node  0 .In  2 , node  2 is added and it is assumed that it does not connect to any node; thus a new island is generated.In  3 ,  3 is added and it is assumed that it connects to  1 and  0 (dashed arrow) because  0 has an incoming link from  1 .In  4 ,  4 generates a new island as  2 in  2 .At  5 and  6 , nodes  5 and  6 are added and connect to the network as  3 did at  3 .
− ) distributions describes the variation of the number   of nodes with  incoming links with respect to the number  of nodes in the network.The term  1 represents how the number of nodes with  incoming links increases,   1 describes how a new node  new randomly selects a node   with −1 incoming links and connects to it, and   1 describes how  new randomly selects an in-neighbor of a node   that has  − 1 incoming links and connects to it (see Figure 2); thus   increases.The term  2 takes into account how the number of nodes with  incoming links decreases, and terms   2 and   2 perform similar functions as   1 and   1 .