The Relation between Degree and Strength in the Complex Network Derived from an Individual Stock

A method based on coarse-graining to construct a directed weighted complex network which models the transformation of the trading data of an individual stock is introduced. The degree (strength) distribution of derived network follows a power-law. A moderated regression equation with interaction effects of average return and out-degree (in-degree) on out-strength (in-strength) is established. Moreover, we found that the differences of nodes affect the network’s structure and average return level impacts nodes’ eigenvector centrality and pagerank, significantly.


Introduction
It is inevitable that the financial market turns into a complex system [1], if the hypothesis of rational man is discarded and the influences of the factors on the financial market, such as the differences among investors, the learning ability, or the external economic environment, are considered.As one of the most important objects in the study of econophysics [2], statistical laws of financial market variables are revealed by various methods which include statistical physics, complex systems theory, and stochastic processes.Complex network, a graph composed of numerous verticals and intricate connections neither completely regular nor stochastic, which always exhibits a large number of nontrivial topological features is an effective tool through which hidden relations of a complex system can be extracted.ER random graphs [3], small world networks [4], and scale-free networks [5] which are the most common models have been widely used to interpret the complexity of the real world.Besides, applications of complex networks analysis have recently attracted considerable interest in the financial market as well as other fields.For example, based on the generalized Indian Buffet process, Boldi et al. introduce a network model which presents complex phenomena well; thus some local and global properties of real networks are explained [6].
Mei et al. redefine a "Complex Agent Network" which is able to capture the multiscale spatiotemporal features of complex systems [7].In [8], some time series transformed to complex networks, and Xu et al. show that the distribution of subgraphs characterizes types of continuous dynamics.Donner et al. provide a thorough reinterpretation of some statistical measures in terms of phase space properties of dynamic systems computed for recurrence networks to which transform from nonlinear time series [9].
There are lots of applications of complex network on financial market.In [10], the evolution of the topology for the global financial networks is used to evaluate the systemic risk.Based on payments data, De La Torre et al. reconstruct economic structure of Estonia and the attacking simulation is used to analyze the vulnerability of the national economy [11].Fan models a dynamically evolving complex bank network system and designs the method to evaluate the system risk by means of lending-borrowing algorithm and multiterm clearing algorithm [12].In [13] the market graph with quite stable power-law structure constructed by cross-correlation is considered a representation of the "self-organized" stock market.Stefan and Atman found that the trust network morphology can alter drastically behavioral model as well as the fluctuations of the stock market index in [14].Tse et al. construct several complex networks which are scale-free for US stocks over two certain periods by means of calculating cross-correlation and suggest that the stock market is actually heavily influenced by financial sector's stocks [15].To analyze Shanghai stock index, Zhang et al. build several scale-free (or small world) networks, suggesting that the existence of hubs and the segments correlated with a given one appear in a Poisson process [16].Chen et al. suggest that centrality and modularity of a complex network based on correlation are used to detect the effect of interconnection on stock returns and industries [17].See also [18][19][20][21][22].
Most of the above studies focus on the varying of the index of stock market or various relations between stocks and external circumstance.Briefly, there are 2 types of network construction: (1) in terms of the similarity of segments and (2) according to the correlation coefficients of different derivatives' return curves.Neither of these two methods is suitable to explore laws of financial time series transforming.For example, the classical GARCH Model [23] which is used to describe the time-varying volatility clustering of financial time series cannot be represented by the above 2 networks because both of them discard diachronic connection between two adjacent nodes.In this paper, a coarse-graining method is adopted to construct a directed weighted complex network which is used to exhibit the time series of a stock exchange data as well as its transformation.We concentrate on the following two questions: (i) Are there any rules on the transformation of stock trading?
(ii) How does the stock return affect the structure of the corresponding complex network?
The rest of this paper is organized as follows: a directed weighted complex network is derived from an individual stock (sh600519) by means of a coarse-graining method in Section 2. In Section 3, the parameters of the network's degree (strength) distribution are estimated and tested, and a regression equation of strength on degree and return and their interaction is established.Besides, this section also includes analysis on edges weight and other statistics.Finally, we summarize this paper and discuss some future works in Section 4.

Methodology
The stock data of "sh600519" from Jan. 4, 2015, to Jan. 4, 2016, is captured from Sina Buz&Tech.Take it as an example, we illustrate how to construct a directed weighted network which displays stock trading.Particularly, the data during 9:25:05 and 9:32:18 on Jan. 4, 2016, is shown in Figure 1.
Let   = (  , V  ) T (T is a transpose operator) on behalf of trading data of sh600519 at time  be a 2-dimensional random variable, where   is the stock price and V  is the trading volume.Denote   1 , 2 = (  )  1 ≤≤ 2 , the subscript set of   is ascending.For an arbitrary , assume that there is a time-varying system   which is drastically affected by multiple factors, such as investors' expectation, which are difficult to quantify, satisfying the fact that there were two positive number Δ and Δ  , such that  −Δ, is regarded as the input of   and  ,+Δ  is the output; then where error () is composed of two parts: (i) random error comes from the influences of the external economic environment; (ii) systematic error originates from both the changes of the investment strategies of the shareholders and the stock's  coefficient.On the one hand, it is the frequent changes in investors' strategies that lead to large systemic error; on the other, as a measure of the fluctuation of a security or a portfolio in comparison with the whole market,  is uncertain since the volatility of the financial market leads to different effects on different stocks in different periods.
If one asks for a rather small (), there is little prospect of finding reasonable Δ, Δ  , and   with relatively simple form that is consistent with (1) for arbitrary .In order to analyze laws of the stock price transformation, we construct a directed weighted network to exhibit the changes of the stock indicators.Firstly, using slip window which is specified by some time division points, the stock data is cut into a lot of segments.Secondly, these segments are mapped to nodes according to a kind of coursing-graining method.Finally, edges are directed and weighted according with the chronological order.Let  = {  } ∈Γ = (  0 ,   1 ,   2 , . . .,    ), where  0 = 09:25:00, Jan. 4, 2015,   = 15:00:00, Jan. 4, 2016.The way of showing  as well as its altering via the directed weighted complex network will be illustrated in detail later.

2.1.
Cutting the Stock Data.Generally, two ways of segmentation are popular: one is the fixed time window [9,13,17], and the other is on the basis of the local extremum [16,27].Unfortunately, both of them are regardless of the change of investors' strategies, which effects the movements of stocks dramatically.Assume that the investors' strategies will be adjusted if no trading happens in a certain amount of time; thus the time span between two adjacent kinds of trading  is regarded as the sign of strategies change.For a specified threshold value , rewrite subscript set as follows: where That is, a division point will be added between two adjacent moments if the time difference of them is greater than .
is the th segment of the separated stock data, and we hypothesize that strategies substantially unchanged during  0  and  1  .For instance, the results of the division ( = 9 s) of stock sh600519 data during 9:25:05-9:32:18 in Jan. 4, 2016, are listed in Table 1.

Coarse-Graining Process.
Typically, coarse-graining means symbolizing of original data via ignoring some information which is relatively unimportant subjectively with the aim of analyzing major characteristics or tendency of time series by symbolic dynamics.In fact, it is almost impossible to put forward a standard coarse-graining process which is fit for all kinds of data because of the subjectivity of the process.Practically, coarse-grain methods which are used to construct network from the financial time series, such as those mentioned in [21,28], are similar: symbolize segments by specified thresholds.How to find out proper thresholds is still unclear.This is because that too large intervals among thresholds lead to severe information loss; meanwhile, too small intervals reserve excessive details which may conceal the major trend.In order to retain some quantifiable features and represent the evolution trend while coarse-graining, some statistics of the segments were used to actualize this process.The specific procedure is introduced as follows. Let for any  ∈ {0, 1, . . ., }, in particular, if contains a call auction, the trading data of the call auction must be removed.
Let us use   0 1 , 1   1 in Table 1 as an example and fix  = 4.

It is easy to compute
whose   th column is the 4-quantile of    .FM 4 is called a 4-quantile matrix of .Thus, it is susceptible to see  0 = (2, 4, 4, 0, 0), and so on for  1 , . . .,   , . . .,   , where   is called the th trading status.It is obvious that there may exist some  and  satisfying   =   although  ̸ = .  is called an initial node or a final node if it is the first node or the last node of that day, respectively; otherwise it is called a transitional node.Let NL = {node 1 , node 2 , . ..} be an ordinal set of   with dictionary order.For each   , there exists a unique node   ∈ NL equal to it and we denote node   = G  .

Construction of Network.
View the nodes in NL as the vertices of a complex network.The remaining work is to confirm their connections.In order to represent transformations between adjacent dealings on the same day, what we need to do is linking nodes based on the chronological order.In fact, the edges and their weights are assigned as follows.
For arbitrary , if   and  +1 occurred on the same day, there will be two cases: (A) If there is not a directed edge from G  to G +1 , add such a directed edge and set its weight to 1.
(B) Add 1 to the weight of the directed edge from G  to G +1 if there is such a directed edge already.
Otherwise, if   and  +1 occurred on different days, do nothing.
Let  traverse through all the values; thus a direct weighted complex network is constructed.
For any two adjacent node  and node  , where node  is the source node and node  is the target node, node  is called the successor of node  and node  the predecessor of node  .Let the out-degree of a vertex be the quantity of its outedges and the out-strength equal to the sum of its out-edges weights and define the in-degree and in-strength similarly; thus the network is constructed.We call it the network of the transformation of the price-volume of the stock sh600519.
The weight of a directed edge is the quantity of linkages from its source node to target node and corresponds to the frequency of this transition.We deem the edges weight positively correlated with the probability of the transition from the source node to the specific target.Clearly, the in-degree of a node corresponds with the node's ability of being a successor of different nodes, the out-degree indicates the extent of the diversity of the node's successors, and instrength (out-strength) is relevant to the probabilities of being a successor (predecessor).In-degree, out-degree, in-strength, and out-strength of node  are denoted as ⟨⟩ in , ⟨⟩ out , ⟨⟩ instr , and ⟨⟩ outstr , respectively.For a certain node  , it is prone to deduce that if both |⟨⟩ in − ⟨⟩ out | and |⟨⟩ instr − ⟨⟩ outstr | are small then node  corresponds to transition between some two different trading statuses, ⟨⟩ in is significantly greater than ⟨⟩ out means the the higher probability of being a final node, and ⟨⟩ in is significantly smaller than ⟨⟩ out means the higher probability of being an initial node.This paper concentrates on the relations between in-degree and in-strength as well as out-degree and out-strength.
Besides, since it is obvious that the above network is completely and uniquely determined by  and  for any given stock data, this way of constructing a complex network is called a (, )-method and the network is denoted as TPVNET , (the transformation of the price-volume network).For instance, fix  = 9 s,  = 4 and Γ is from 09:25:00, Jan. 4, 2015, to 15:00:00, Jan. 4, 2016; the TPVNET 9,4 is shown in Figure 2.   of strength.In addition, the scaling exponents are listed in Table 2 and Figure 4.

Data Analysis
Almost all curves (point-groups) looked like straight lines when strength (degree) is relatively large in double logarithmic axes, which indicates the scale-free properties of all the TPVNET (⋅,⋅) .So the test of power-law is necessary which will be done in 2 steps.
Step 1. Estimate parameters based on the actual data.
Step 2. Test the result by some methodology.
In general, the degree (strength) distribution of nodes whose degree (strength) not less than a certain threshold value  min may obey a power-law.Clauset et al. proposed a principled statistical framework which combines maximumlikelihood fitting method with goodness-of-fit test based on the Kolmogorov-Smirnov statistic and likelihood ratios to discern and quantify power-law behavior in [29].The result of parameters estimating is shown in Table 2 and Figure 4.The result of Kolmogorov-Smirnov testing is listed in Table 3.

Network Statistical Characters.
Only TPVNET 60,5 and TPVNET 120,5 which passed the K-S testing satisfy that  of both distributions are between 2 and 3. We take TPVNET 60,5 for further discussion since By the construction of TPVNET 60,5 we know that it could consist of 5 5 = 3125 nodes.In fact, there are 481 nodes and 2738 edges; the sum of edges' weight is 5536.Figure 5 displays nodes' degree and strength.
on degree and strength and their relationship is useful for understanding the evolution process of the networks as well as the linking prediction [30].Table 4 exhibits the top 10 nodes of out-degree, out-strength, in-degree, and instrength.Furthermore, these pieces of information are shown in Figure 6.
From Table 4 and Figure 6 we found the following.
(1) 00200, 00210, 00300, and 11101, 11111 are in all four sequences; they correspond to the highest frequency transitional nodes between different trading statuses because their out-degree and in-degree are large and their out-strength and in-strength which are roughly equal are large too.The transitional nodes of this kind have the following features: the duration is less than   Figure 6: Top 10 of out-degree, out-strength, in-degree, and in-strength, where () means the node is one of sequence  .For example, "44442(1, 2)," that is to say, node 44442, is one of the top 10 nodes of out-degree and out-strength, so it belongs to sequence 1 and sequence 2 .4531 s, the avg.trade volume (per second) is fewer than 1.406, the avg.return lies between −0.003465 and 0.0004174, and the standard deviation and the range of the returns are less than 0.001739 and 0.007528.
(2) The difference between sequence 1 and sequence 2 (or sequence 3 and sequence 4 ) indicates that degree is not the only factor affecting strength.
(3) Obviously, the strong positive correlation should exist between out-degree and out-strength or between indegree and in-strength.There are 510 nodes with positive out-degree and 562 nodes with positive in-degree.
The correlation coefficient between out-degree and out-strength is 0.9223 and 0.9311 between in-degree and in-strength.A scatter plot exhibits their linear relations; see Figure 7.The significantly positive correlation between out-degree and out-strength can be seen from Figure 7; thus the relationship between them was set up through polynomial regressions.The result is listed in Table 5.
The relationship between in-degree and in-strength is similar and the results of polynomial regressions are listed in Table 6.
The result of higher-order polynomial regression is significantly better than that of linear, which shows that the increase of the out-strength (in-strength) is inhomogeneous, and we deem that such increase is due to at least two different factors: (A) The increase of out-degree, which represents the relation between the node and its neighbours, is regarded as a kind of external cause.
(B) Some features of the node, such as   and   which hide the ability of being a successor, are regarded as a kind of internal cause.
Nodes' attributes describe the corresponding transaction status including the average return and the range of return; networks statistical characters, such as degree, closeness and centrality, represent the relations between nodes and neighbours, how the nodes are embedded in the networks, and the importance of each node.As mentioned in Table 6, the relationship between degree and strength is significantly influenced by nodes' attributes.We will utilize hierarchical multiple regression (HMR) to model how the nodes' attributes affect the relationship between degree and strength.The following work concentrates on quantizing this affecting, that is, the internal cause.To simplify our work, we barely consider the impaction of return on the nodes whose degree not less than 1.
The result of HMR is listed in Table 8.The influences of the interactions of all the dummy variables and DOD on DV are significant, so do  8 .
We found that there is not any significant difference on the explanatory power between models 2-1 and 2-2, even though  8 is removed in model 2-2.Thus, the moderated regression equation on out-strength with interaction effects should be where  1 is the deflated out-degree and  2 ,  4 ,  6 , and  8 are the dummy variables corresponding to level 0 , level 1 , level 3 , where DV is in-strength and   are deflated in-degree or interactions of nodes' deflated in-degree and average return level, like that of out-degree.To verify ( 6) and ( 7), let the stock data from 9:00, Jan. 5, 2016, to 15:00, Apr. 4, 2016, together with the former data be the testing set.See Figure 8 for details.

Edges Weight.
Let the vertices set of TPVNET 60,5 follows dictionary order if we regard each node as a 5dimensional vector.The adjacent matrix of TPVNET 60,5 is shown in Figure 9.
For an edge  , , that is, a directed edge from node  to node  , let dis , = |node  − node  |, where | ⋅ | is the Euclidean distance, be called the length of edge  , and Dis = {dis , } which is an ascending set.Let soe() = ∑ dis , =  , , where  , is the weight of  , .In order to investigate the relationship between the length of edges and the linking possibilities, we take dis , as  coordinate and  , or soe(dis , ) as 1 (blue) or 2 (red) coordinate, respectively, in Figure 10(a); the cumulative proportions of soe and the number of edges according to the edges transformation are shown in Figure 10(b).
The above analysis illustrates that the transformation between source node and target node is basically negatively related to its frequency; thus the heterogeneity of nodes cannot be ignored when we consider the characters of TPVNET 60,5 .

Other Statistical Features.
Except degree and strength, there are still a lot statistical characters, such as clustering coefficient and betweenness centrality, that are used to describe the topological features of a complex networks; see [31].Unfortunately, few of them can be used directly on TPVNET 60,5 since it is a directed weighted network.Barely considering the transformation of stock trading status, edges Figure 8: Testing of the relationship of degree and strength.Most of the residuals are small, but there are still several nodes with large deviance.In fact, the deviance of each is increasing while out-degree or in-degree is growing, which indicates that there are still some other factors, not the return, that may impact strength growing.
weight can be omitted; thus TPVNET 60,5 degenerates to a directed unweighted network denoted by TPVNET ∘ 60,5 .Some statistical features of TPVNET ∘ 60,5 are calculated; see Table 11.From Table 11 we can see that it takes 4.05 times in average or at most 11 times of alternation from any trading status to another.Links exist between only 0.4% of pairs of nodes.TPVNET ∘ 60,5 is divided into 18 communities; the proportion of the number of alternations inside these communities over the total number in the whole network is more than 42.8%.
That the average clustering coefficient is significantly larger than that of a random graph, which is less than 0.01, means the small world property of TPVNET ∘ 60,5 .The distributions of clustering coefficients, betweenness centralities, eigenvector centralities, and pageranks, are shown in Figure 11.To analyze the influence of return on the importance of nodes, Kruskal-Wallis test is adopted to distinguish the effects of different levels of return on some nodes' measurements which are non-Gaussianity.They are closeness centrality (CC), harmonic

Conclusion and Discussion
We introduce a (, )-method to construct a directed weighted complex network which models the transformation of the trading data of an individual stock sh600519 in 5 years; Mathematical Problems in Engineering Figure 11: Statistical features actually do not follow Gaussian distributions.(a) There are 11 nodes whose clustering coefficients are equal to the maximum, but the degree of them is not more than 3.If we limit the degree of nodes to not less than 5 and 10, then the maximum clustering coefficients are 0.9167 and 0.395, respectively.It means that TPVNET ∘ 60,5 is a blank graph.(b, c, and d) To make the pictures more clearly, we use the log 10 (⋅) instead.All of the three statistical characters describe the importance of nodes.In fact, node 00200 is the maximum node of each, and the differences among their distributions are significant.that is, divide the data by a time interval threshold value (), coarse-graining according to some statistics, and the number of classifications (), linking and weighting nodes in a chronological way.The derived TPVNET (⋅,⋅) satisfies the power-law for certain combinations of (, ), such as TPVNET 60,5 .We found the inhomogeneity of the increase of strength and then build moderated regression equations on out-strength (see (6)) and in-strength (see (7)) with interaction effects.Besides, the heterogeneity of nodes affects the structure of the network dramatically; the level of avg.return of trading statuses impacts nodes' eigenvector centrality as well as pagerank.
It is susceptible to see a network constructed by the (, )method losing a great deal information of the time series, so revealing all the rules of transforming is still difficult.Another weakness is the lack of related theoretic supporting which is an obstacle of building its evolution dynamic models.Besides, if the distributions of    can be utilized appropriately, the results may be greatly promoted.

Figure 2 :
Figure 2: There are 321 nodes and 16669 edges and the sum of edges weight is 144884, which means 321 kinds of trading status, 16669 kinds of different transitions, and 144884 transitions in total, respectively.

Figure
Figure 4: Most (i.e., 24/30 = 80%) values of  lie between 2 and 3, which shows that TPVNET (⋅,⋅) is similar to artificial technical networks, the cost of linking heavily and the edges linearly depend on the nodes[26]. min is decreasing as  is growing because of the diminishing of nodes number.Besides, (b) exhibits that  and  are positively related.

Figure 7 :
Figure 7: The linear relation between degree and strength.

Figure 9 :
Figure9: The graph of the adjacent matrix, where and -axes are the serial numbers of source and target nodes; the color (size) of circles represents edges weight.In brief, the source nodes and target nodes for the edges with relative large weight are similar.

Figure 10 :
Figure 10: (a) In general, the larger edges length, the less edges weight.(b) It is obvious that the distribution of edges weight (or amount of edges) is nonuniform.For instance, more than 80% of transformation of edges are less than 4.47 which are 40% of the minimum dis of Dis; approximately 60% of the edges weight are occupied by 20% of the minimum dis of Dis.
denotes the time of the first trading in Jan. 4, 2016, which is the 520571th trading in SS.It should be written as  520571 because   is the moment of the th (in SS) stock deal.Here we still use  0 rather than  520571 just for convenience, so do    , , and .(2)Different values of  correspond to different segmentations.For instance,  = 1 day means separating the stock data every week;  = 0.5 day means separating every day.

Table 3 :
value of K-S testing.

Table 4 :
Top 10 nodes of out-degree, out-strength, in-degree, and in-strength.

Table 7 :
Dependent variable and predictors.