To evaluate how much two different complex topologies are similar to each other in a quantitative way is an essential procedure in large-scale topology researches and still remains an NP problem. Cross-correlation evaluation model (CCEM) together with Genetic Algorithm (GA) is introduced in this paper trying to solve this issue. Experiments have proved that SLS (Signless Laplacian Spectra) is capable of identifying a topology structure and CCEM is capable of distinguishing the differences between corresponding topology SLS eigenvectors. CCEM used in GA is recommended at last since a way of not finding the optimum solution in GA is a good way to reduce computing complexity.

The research on the Internet topology modeling has been growing into a hot topic in Internet-related research fields recently [

We take Internet topology as an example trying to solve this issue by constructing a quantitative model including cross-correlation evaluation model (CCEM), spectral density [

A nondirected graph

Spectrum of a graph

Spectral density

The samples are the measured router-level Internet results with 1,145,841 routers (nodes) and 2,907,638 links. After IP alias solution [

To further simplify the computation, we performed a second-order sampling (resampling) operations on the experiment samples, and the re-sampling rules are (1) resampling operation is completely random, it could start from any effective node in target graph; (2) resampled results must be a connected graph; (3) Re-sampled results should cover as much nodes as possible, that is, node selection is preferential to link selections.

At last, the re-sampled Internet topology graph was converted into an adjacency matrix for further calculation.

Before we made use of spectral density to construct CCEM, we would first testify whether it could be used to distinguish topology graphs (including Internet topology) or not.

Three representative graphs: ER random graph, scale-free graph, and Internet topology graph were selected for the test in this paper.

According to [

And spectrum density of a scale-free graph out of BA model [

We can find from [

For simplicity and better comparison, we draw three copies of Internet graph with the re-sampling tool mentioned above and the size of the three samples after re-sampling are 30 nodes and 29 links, 300 nodes and 536 links, and 500 nodes and 753 links, respectively. Their eigen values and spectral density are listed in Table

Eigenvalues and spectral density of three re-sampled internet graphs.

30 ips | 300 ips | 500 ips | |||

_{1} | _{1} | ||||

−3.2196 | 0.0333 | −8.7818 | 0.0033 | −10.7058 | 0.0020 |

−2.6318 | 0.0333 | −8.0004 | 0.0033 | −10.2681 | 0.0020 |

−0.5663 | 0.0333 | −0.1767 | 0.0033 | −0.2635 | 0.0020 |

−0.0000 | 0.5333 | −0.0000 | 0.5567 | −0.0000 | 0.7320 |

0.5663 | 0.0333 | 0.1479 | 0.0033 | 0.1113 | 0.0020 |

2.6318 | 0.0333 | 8.8174 | 0.0033 | 10.9470 | 0.0020 |

3.2196 | 0.0333 | 14.1650 | 0.0033 | 12.3570 | 0.0020 |

The symmetry of the spectral density could be found from Table

However, there are differences between the graphs, and we illustrated the Internet’s spectra diagram in Figure

Spectral density diagrams of three Internet graphs. The subgraph in the top-right is a plot zoomed in to

From Figure

All three graphs comprise quite different sizes and contents (specific routers and links) due to re-sampling rules, and the conformity found in Figure

Next, we find that the center of three spectral density curves in Figure

Again, we begin to distinguish the Internet graph from the ER graph, and the differences are easily found. So, we make the conclusion that the spectral density is OK in distinguishing Internet graph from the ER graph.

Together with the fact that spectral density gives a quantitative description of Internet topology characters, we would make use of it in CCEM for Internet topology modeling.

For a better view of spectra distribution, we calibrate the coordinate system by a factor of

What is more, we enlarge the size of the re-sampled Internet topology graph from 30 ips, 300 ips, and 500 ips (Figure

Spectral density of five re-sampled graphs. The sub-graph in the top-right is a plot zoomed in to

We know that the more nodes a graph has, the closer to real Internet it is. However, a graph with 4000 ips is the largest one in this paper, and the reasons are (1) limitations of computing abilities, the calculating efficiency of spectral density would decrease sharply if the size of the graph increases over 4000; (2) Internet characters could be well expressed through spectral density no matter how many nodes an Internet graph has. And this is a fact had been proved in Figure

From Figure

Similar to what was found in Figure

Back to the basic idea of this paper, to distinguish topology graphs by comparing their spectral density. However, the spectral density is somewhat in coarse granularity, there is another especially valuable kind of spectral density named Signless Laplacian Spectra (SLS) which could give further and finer information on a graph’s properties [

An SLS matrix

SLS analysis results on four 3000-ip graphs, where axis

From Figure

There are two evident horizontal lines when SLS equals to 1(10°) and 2, which means that there are the most nodes in the Internet topology graph when SLS equals to 1, and the second-most nodes at SLS = 2. All four samples exhibit same properties clearly in Figure

For the other part of Figure

Power law distribution fitting results with descending eigen value when SLS > 2 of four re-sampled graphs.

Power-law distribution fitting results with descending eigen value when SLS < 1.

From Figure

However, there is not clear power-law relationship since ACC is rather small in Figure

Compared with the general spectral density, SLS is better since (1) SLS is recommended to be the best spectra in [

So, SLS would be selected for studying CCEM.

To evaluate an Internet model is to determine the differences between the generated Internet topology and the real Internet topology. SLS eigen values sequences are introduced to determine the differences as a quantitative evaluation way.

The SLS eigen values are a series of numerical numbers representing the primary characters of the target graph, that is, the Internet topology graph. With the two value sequences, the problem left for us is to find an effective algorithm to get the evaluation result between them.

CCEM, then is used to evaluate whether a given or a generated topology is similar to or same as the real Internet topology. And the first requirement of CCEM is to transform SLS into data sequence.

After the sort of eigen values of SLS in descending way, the data sequence is gained and ready for the next step evaluation, as is shown in:

Cross-correlation algorithm is capable of distinguishing and identifying the differences between numerical number sequences in an absolutely quantitative way [

If two given topologies are completely identical, then:

Now, we have proved that cross-correlation value reaches maximum when

Next, we are going to prove when

When the disalignment lag

And for

We then use SLS eigen values from Figure

From Figure

Autocorrelation calculation of SLS eigen values with disalignment lags, all four SLS sequences come from Figure

And for Figure

Cross-correlation calculation of SLS eigen values with disalignment lags, all four SLS sequences come from Figure

The four SLS sequences, however, all come from real Internet topology, are quite similar to each other. And we can see that the maximum of three cross-correlation nearly reach 1, quite close to the maximum value of autocorrelation in Figure

By now it seems that the alike topologies always reaches a maximum close to 1 during cross-correlation calculations, what about the dislike topologies? We select SLS(1) and make cross-correlation calculation with three random sequences and illustrated the results in Figure

Cross-correlation calculation of SLS (1) eigen values and three random sequences with disalignment lags, SLS sequences (1) come from SLS(1) and random (1) in Figure

From Figure

Secondly, the growing curves are not close to zero any more, but close to 0.1. The reason is that part of the randomly generated sequences is “similar” in some way to part of SLS sequence (1). The “similarity,” however, is quite low since the cross-correlation values are near 0.1 and 0.2, quite far from 1, the value of the cross-correlation calculation from completely identical topologies.

With Proof 1 and illustrations from Figures

The gained result from CCEM would be a relative large cross-correlation value if the two sequences or two topologies are similar to each other, or a small value otherwise. Then a threshold would usually be set for making decisions when using CCEM in evaluating Internet topology model.

The CCEM algorithm for the Internet topology is shown in Table

The CCEM algorithm for the Internet topology modeling.

Steps | Operations |
---|---|

Get an adjacency matrix of the target Internet topology graph (as a template Internet graph) by the Internet re-sampling tool; | |

Get the SLS eigen value sequence by SLS operations; | |

LOOP | |

(3.1) | Construct a modeled Internet with An Internet Model (with specific parameters), and get its adjacency matrix; |

(3.2) | Get its SLS eigen value sequence next; |

(3.3) | Perform cross-correlation algorithm on the two sequences: the modeled sequence from (3.2) and the template sequence from |

End LOOP till the cross-correlation result is greater than the threshold, which implies that the modeled Internet is similar enough to the real Internet; |

The size of the modeled Internet graph and that of real Internet graph must be identical, and the user could controls how to set the value. We know that the real Internet graphs with different size are quite different, even the real Internet graph with the same size but re-sampled at different time, are not identical to each other. So the result gained out of the algorithm may differ in some way each time.

But we still consider the CCEM algorithm to be effective because (1) the properties of the real Internet by re-sampling rules are quite similar (Figures

A way to use CCEM is recommended as to use it within a Genetic Algorithm (GA). Here are the reasons.

GA fits the CCEM studied in this paper quite well. GA could give direct calculations and optimizations when using CCEM to evaluate and optimize a given topology to real Internet topology.

Most Internet modeling researches are out of statistics at present because the Internet is too large to be handled by other approaches. And the most statistical result is a mathematical model with uncertain parameters, for example, some parameters are defined as data sequences [

In the meanwhile, GA is good at reducing computing complexity by its ability of finding a secondary optimum solution.

So CCEM is recommended to be used in a GA in Internet topology modeling or other large-scale topology researches.

CCEM and its algorithm were studied in this paper. Firstly, we testified the ability of spectral density in distinguishing different graphs by performing it among ER random graph, BA scale-free graph and the Internet topology graph. We found that three yielded spectra showed quite different properties, so that the spectral density approach was confirmed to be capable of distinguishing and identifying Internet graphs.

Next, we get topology’s SLS eigen values and input them into CCEM to quantitatively evaluate the difference between graphs.

Finally, CCEM used in GA was recommended in Internet topology modeling or other large-scale topology researches to reduce computing complexities.

This work is supported by the National Natural Science Foundation of China (60802031), the Liaoning Provincial Natural Science Foundation (201003676), and the Natural Science Foundation of Shenyang city (F10-205-1-26).