Dynamic Communities in Stock Market

and Applied Analysis 3 3-January-2000 Time 3-August-2010 Stock data T (window width) Window 1 Window 2 Window 3 δT (window step length) · · · Window M− 1 Window M Figure 1: The data windowing method and parameters. kind of correlation coefficients, and the method of time windows division to construct asset graphs can be found in the literature [6]. They also point out that the choice of T is a trade-off between too noisy and too smoothed data for small and large T, respectively.They find out that δT = 21 days and T = 1000 days are optimal values [16]. However, we find that T = 1000 is not suitable for our data, although δT = 21 is a good choice through many experiments. Some experimental results can be seen in Figures 2 and 3. Let T = {1000, 400, 200, 100} and let the window step length δT = 21 (fixed at about one month). Figure 2 shows four plots of the mean correlation coefficient as a function of time, defined as


Introduction
The stock market is a dynamic complex system formed from many enterprises, institutions, and individuals, which are connected with each other by trade, investment, and so forth.The stock market has a great effect and influence on a country or region's economic and financial activities.But we have noticed that it is very difficult to predict and control it.There is an urgent need to have a new and global understanding of the structures and dynamic characteristics of stock market.
Previous researches on stock markets mainly focused on some competition and cooperation games under specified conditions.Because of the high complexity of the stock market, the conclusions under the limited conditions often only reflect a part of the problems.This forces us to adopt a new study mode, from a more macroperspective, to explore the characteristics and reasons behind stock markets complex changes.
From chaos to complexity, from the molecular activities of cells in our body to the communications between people in our entire planet, complex network theory provides a new method to explore the world for us.Particularly, from 15 years ago, Barabási published his pioneering paper of scalefree network [1]; complex networks have attracted the interest of many researchers in different fields of the world.And a large number of research results have been produced in recent years.These research achievements provide a powerful tool and reference for our understanding of the real world complex systems, such as protein interaction networks in the field of biology, social networks, and scientists collaboration networks in the field of sociology [2,3].The theory and tools of complex networks also provide us with a new perspective to study stock markets.The price of stocks is the final and most core reflection of a stock market; therefore, in this paper, we construct stock networks based on stock prices and study the evolution characteristics of the community structure in it, by using complex network theory and tools.The evolution characteristics of community structure in time series not only reflect the changes of a stock group, but also reflect the stock market's global features.Through such a new research 2 Abstract and Applied Analysis approach, from a more macropoint of view, we can mine and reveal the underlying characteristics and laws hiding in the big data of stock markets.
In this paper, we constructed stock networks based on prices of stocks in a stock market.The stock network refers to the graph consisting of nodes (vertices) and edges, where nodes correspond to stocks (companies) and edges between them to price fluctuation relationships, which are constructed by computing a correlation coefficient of each pair of stocks.Mantegna is the first person to construct stock networks based on stock price correlations [4].After that, many papers based on stock price correlations were presented.For example, Onnela et al. studied split-adjusted daily closure prices for a total of  = 477 stocks traded at the New York Stock Exchange over the period of 20 years, from 2 Juanuary 1980 to 31 December 1999 [5,6].They constructed dynamic asset graphs and dynamic asset trees based on price correlations and discussed their properties and differences.Kullmann et al. studied the clustering of companies within a specific stock market index, like the Dow Jones (DJ) or the Standard & Poor's 500 (S&P 500), by using the Potts superparamagnetic method [7].They constructed an appropriate q-state Potts model, where the spins correspond to companies and the interactions are functions of stock price correlations.Boginski et al. studied characteristics of the stock network representing the structure of the US stock market and detected cliques and independent sets in it [8].Jallo et al. constructed three kinds of stock networks based on American and Swedish stock markets and compared the characteristics of three construction methods [9].Vizgunov et al. constructed the stock network for different time periods from 2007 to 2011, based on the Russian stock market.They found that for the Russian market there is a strong connection between the volume of stocks and the structure of maximum cliques for all periods of the observations [10,11].Huang et al. constructed a correlation network of the China stock market using the threshold method and then studied the structural properties and the topological stability of the network [12].
More specifically, in this paper, we study the split-adjusted daily closure prices for  = 400 stocks which were traded at the Hong Kong Exchanges (HKEx) over the period of 10 years and construct stock networks based on stock price correlations.Different from the literatures mentioned above, we focus on the properties of dynamic communities in the networks.Since one of the most relevant features of networks representing real systems is community structure, which is the organization of nodes with many edges joining nodes of the same communities and comparatively few edges joining nodes of different communities [13].Moreover, a financial market is characterized as an evolving complex system [14].So the evolution (or change) of communities is analyzed in this paper.Basic events that may occur in a community evolution are birth, growth, contraction, merger with other communities, split, and death, which were systematically proposed by Palla et al. in the literature [15].Therefore, we believe that the analysis of dynamic communities in a stock market is more meaningful than static ones and that is a new macroperspective to understanding a stock market.Through the analysis, we find several phenomena as follows.First, the evolution of communities in stock networks is different from other networks, such as social networks in the literature [15] and, second, correlativity exists between the characters of dynamic community structure and the fluctuation of the stock market.These results potentially contribute to market analysis and decision-making.
The paper is structured as follows.Section 2 describes how to construct stock networks.In Section 3, we describe how to detect and match communities in the networks.The analysis of dynamic communities is then offered in Section 4. Finally, in Section 5, we summarize our findings and present some thoughts on future researches.

Constructing Stock Networks
In this paper, the term stock networks refers to a set of undirected graphs, where the nodes correspond to stocks and the edges correspond to correlation coefficients between them.The data set is stocks' daily closure prices traded at the Hong Kong Exchanges (HKEx).We chose  = 400 stocks and collect the stock data over the period of 10 years, from 3 January 2000 to 6 August 2010.We construct networks by the split-adjusted daily closure prices of stocks, in a total of 2616 price quotes per stock.The data is divided into  windows of width  in order to uncover dynamic characteristics of the networks.The window width  corresponds to the number of daily returns included in the window.A number of consecutive windows overlap with each other.The starting time of a window is determined by the window step length parameter , which describes the displacement of the window, measured in trading days.The data windowing method and some associated parameters are illustrated in Figure 1.
Let   () be the closure price of the stock  at time , where  refers to a date.Given a time window ,  = 1, 2, . . ., , let the return vector of stock  in the window  be    , whose components are logarithmic returns of the stock  in the window ; that is,   () = ln   () − ln   ( − 1), where the value extension of  is extended from the second trading day in the window  to the first trading day in the next window +1.In order to investigate correlations between stocks in the window , the correlation coefficients between stocks  and  are defined as where ⟨⋅⟩ indicates a time average over  days.The correlation coefficient    fulfills the condition −1 ≤    ≤ 1, and the value of    reflects the level of correlations between the stock  and stock , from the perfect correlation (   = 1) to the perfect anticorrelation (   = −1).Those correlation coefficients form an  ×  correlation matrix   , which is the basis of stock networks constructed in this paper.
To construct stock networks, we need to discuss two parameters,  and , first.Onnela et al. have used this kind of correlation coefficients, and the method of time windows division to construct asset graphs can be found in the literature [6].They also point out that the choice of  is a trade-off between too noisy and too smoothed data for small and large , respectively.They find out that  = 21 days and  = 1000 days are optimal values [16].However, we find that  = 1000 is not suitable for our data, although  = 21 is a good choice through many experiments.Some experimental results can be seen in Figures 2 and 3.
Let  = {1000, 400, 200, 100} and let the window step length  = 21 (fixed at about one month).Figure 2 shows four plots of the mean correlation coefficient as a function of time, defined as To have a clearer picture of correlation coefficients, Figure 3 shows four contour plots of probability density functions for the correlation coefficients with different  values.From the visual point of view, it is difficult to say which is the optimal  value.It seems that set  = 1000 in Figures 2(a) and 3(a) makes the data too smooth, which may lose too much market information.On the contrary, setting  = 100 in Figures 2(d 2 and 3, we can also find that stocks are more closely associated with each other when the stock market fell.This phenomenon is described by the commonly heard phrase of "decline is characterized by the stocks moving together." Setting  = 200 and  = 21, the overall number of windows is  = 116; that is,  = 1, 2, . . ., 116.With these choices, we can construct the stock networks   based on the correlation matrix   , by simply considering   as the adjacent matrix of   .Then   are weighted undirected complete graphs.However, it is hardly to analyze the community structure in these complete graphs.Since these graphs represent the market, it is natural to construct some graphs by including only the strongest connections in it.But how many edges (connections) should be included in such graphs?From Figure 4(a), we can find that the fewer edges included the fewer nodes incident with at least one edge.It means that if we include few edges, then many nodes will become isolated nodes.This will lose a lot of useful information.On the contrary, if we include too many edges, then graphs will not have distinct community structure, measured by modularity values, as Figure 4(b) shows, where the modularity is used as an indicator of community structure, which measures the density of links inside communities as compared to links between communities.It is defined as where   represents the weight of the edge between nodes  and ,   = ∑    is the sum of weights of edges attached to node , and   is the community to which node  is assigned.It is detected by the algorithm of Blondel et al. [17], which will be introduced in the next section.Consider (, ]) = 1 if  = ] and 0 otherwise; consider  = (1/2) ∑ ,   is the total weights of edges of the graph.
In practice, modularity values of many real networks typically fall in the range from about 0.3 to 0.7 [18].When the modularity value of a network is below 0.3, it can be considered to have no distinct community structure.Therefore, in this paper, we include 1.2% of total edges to construct stock networks, according to the correlation values from large to small and deleting all isolated nodes.Then the average modularity value of the networks is 0.302 and the average coverage of nodes reaches 35.2%.Finally, stock networks   are weighted undirected graphs with the fixed edges number 957 and average 140.68 nodes.Figure 5 shows several pictures of stock networks   .In the figure, different node colors represent different communities.As we expected, nodes in the same community are basic stocks belonging to the same industry, which also fits the stock movements in the real market.

Detecting and Matching Communities in Time Windows
After constructing the stock networks, we detect communities in each time window using the algorithm of Blondel, which is introduced in the literature [17] and has been widely used in many cases of weighted graphs.There are several reasons why we choose this algorithm.First, stock networks are weighted graphs, and the quality of communities detected by Blondel's algorithm is very good, as measured by the weighted modularity.Second, this algorithm can unfold a complete hierarchical community structure for a network, which is very useful for further studies of hierarchical structures in stock networks.Third, the algorithm is extremely fast.It is shown that this algorithm outperforms all other known community detection methods in terms of computation time in the literature [17].
Blondel's algorithm uses a greedy method based on weighted modularity optimization.Initially, all nodes of a graph are put in different communities.Then, the algorithm is divided into two phases that are repeated iteratively.The first phase consists of a sequential sweep over all nodes until no further improvement of modularity achieved.At the end of the first phase, the first level partition is obtained.In the second phase, a new network is built whose nodes are the communities found during the first phase.The two phases of the algorithm are then iterated, yielding new hierarchical level partitions, until there are no more changes and a maximum of modularity is attained.The details of the algorithm can be seen in the literature [17].
After communities have been detected in each time window separately, to analyze characters of dynamic communities, communities in succeeding time windows have to be matched with each other.We use the match method posed    in the literature [15].The method is a process of finding counterparts.Communities are matched from consecutive time windows in descending order of their relative node overlap (i.e., Jaccard similarity coefficient).The relative node overlap between communities  and  is defined as (, ) = | ∩ |/| ∪ |, where | ∩ | is the number of nodes in the intersection of  and  and | ∪ | is the number of nodes in the union of two communities.When a community has no counterpart from communities in the previous or the next time window, it is considered as a newborn community or finished its life, respectively.

Characters of Dynamic Communities in Stock Networks
First, we investigate some basic statistic properties characterizing the dynamic of stock networks, which are distributions of the coverage of networks (the ratio of nodes contained in a network), the community number, the modularity value , and the overall community size.The results are shown in Figure 6.There are overall 961 communities that can be detected, in all time windows.The maximum size of communities is 83 and the minimum is 2. In Figure 6(a), we show the overall community size distribution, which resembles a power-law distribution.From Figures 6(b), 6(c), and 6(d), we can find that these three curves have similar variation tendency.When the coverage ratio gets smaller, as fixed edges number of networks, connections of nodes will become denser and community structure will become more indistinct.Using the bear market in the period from October 2007 to August 2010 as a reference again, it is clearly shown in the figures that these three curves are in a low level at this period.This implies that when the market declines a few stocks will own stronger connections with each other and these connections are so tight that the network cannot make distinct community structure.Conversely, when the market is good, more stocks will own strong connections and easily form communities.Furthermore, this gives us a new inspiration; the modularity level of stock networks can reflect that the market is good or bad.
Second, we consider a basic quantity characterizing a dynamic community with its age , representing the time passed since its birth.There are 243 dynamic communities that can be extracted from all communities in time windows.The average age of dynamic communities is 3.95.Figure 7 illustrates the age distribution of dynamic communities, and we can find that it displays power-law shape.Most age of dynamic communities is very small, 92.6% less than 10, which reveals the high dynamic nature of stock market in part.Figures 8 and 9 show the correlations of the dynamic community age  with the start size  of a dynamic community and the dynamic community stationarity , respectively.The stationarity   of a dynamic community (say   ) is defined as the average correlation between subsequent states,   ≡ [∑  max −1 = 0 (   ,   +1 )]/( max −  0 ), where  0 denotes the birth of the community   ,  max is the last step before the extinction of the community   , and  denotes the Jaccard similarity coefficient mentioned in Section 3. The stationarity  represents the stability of community components during the lifetime of a dynamic community.The larger the  is, the smaller the change of components is.From Figure 8, we cannot found clear correlations between the size and the age.
It is not as we expected; we thought larger communities may be on average older, just like social relation networks in the literature [15].The correlation between the stationarity and the age is relatively more obvious, as shown in Figure 9.It is suggested that the value around 0.7 is easy to form older dynamic communities.Intuitively, the tightly linked community will probably have longer lifetime.To verify this guess, for each community, we measured the total weight inside the community ( in ) as well as outside the community ( out ).Then, we calculated the average age (⟨⟩) as a function of ⟨ in /( in +  out )⟩ of a dynamic community.But, from Figure 10, the curve reaches its peak at 0.6; we find that more tightly dynamic communities are not necessarily older.

Summary and Further Studies
In summary, we have introduced some characters of dynamic communities in stock networks, which we have studied recently.The way of constructing stock networks and detecting communities can also be used in other similar complex systems.There are two main results obtained in this paper.are both not good for a dynamic community's long life.Instead, when tight extent of a community ( in /( in + out )) and stationarity of a community () are about 0.6 or 0.7, a community will be easier to have long life.By the way, this is a coincidence or there are some correlations with the famous golden ratio (about 0.618) or 30 : 70 Pareto Principle, which need further research.
Our results potentially contribute to financial market analysis and decision-making.Furthermore, there are many problems that need further research.For example, from Figure 5, we can find that there are dense links between several communities and some nodes densely link to not only one community.This implies that stock networks have hierarchical and overlapped community structure.The study of hierarchical and overlapped communities in stock networks may reveal more interesting phenomena.We also study some statistical properties of a single node, such as the probability of leaving its community, the lifetime of a single node in a dynamic community, and the weight ratio between inside a community and outside a community.But we have not found a clear correlation between these properties of a single node.This is because of the high dynamic nature of financial market or the lack of our empirical data, which will be studied in future research.In this paper, empirical data are based on the Hong Kong stock market; characters of dynamic communities in other areas or economies are also worthy of further study.

Figure 1 :
Figure 1: The data windowing method and parameters.

Figure 2 :
Figure 2: Some experiment results of () as a function of time.

Figure 3 :t
Figure 3: (Color online) Contour plots of probability density functions for correlation coefficients with different  values.

Figure 5 :
Figure 5: (Color online) Several pictures of stock networks   .Different node colors represent different communities and node sizes reflect its degree.

Figure 6 :
Figure 6: Plots of several basic statistic properties of stock networks.

Figure 7 :Figure 8 :
Figure 7: Probability distribution of the dynamic community age .

Figure 9 :Figure 10 :
Figure 9: Scatter plot of correlations between the dynamic community age  and the dynamic community stationarity .⟨⟩ represent the arithmetic average value of .