Research Article Anatomy of Complex System Research

,


Introduction
"I think the next century will be the century of complexity" said Stephen Hawking, who is a famous theoretical physicist and cosmologist. Indeed, we are surrounded by systems that are unprecedentedly complex, containing many individual components together with interactions among them, possibly nonlinear ones [1]. Consider, for example, the survival of human society needs cooperation between billions of individuals or communication infrastructures that connect and integrate human efforts. Another example is science. As a complex system, science contains millions of digitalized papers, patents, grant proposals, white papers, together with citation or referencing relationships among them, catalyzing the rapid development of modern science and technology [2,3].
e above examples vividly illustrate that today's complex system research has implications for our society, from human mobilities [4], to economics [5], world trades [6], and human interactions or collaborations [7,8].
Particularly with the emergence of network science at the dawn of the twenty-first century, complex system study generates a substantial amount of original research [9]. Also, it is related to many revolutionary technologies of the twenty-first century, including Google, Facebook, and Twitter [10].
e applied values and substantial implications of complex system study raise very interesting and important questions: where does complex system research come from? How tightly connected are other academic areas and the cutting-edge complex system research? Anecdotal evidence indicates that complex system study may be rooted in chaos theory, and it is also believed to be tightly connected to selforganization, agent-based modeling, and artificial intelligence. Despite of that, we lack a systematic and empirical understanding on how complex system research affects or is affected by other academic fields. anks to the availability of large-scale bibliometric datasets and the development of science of science research, the above-mentioned questions can be empirically answered for now. Science of science offers us unprecedented opportunities to explore and uncover the citation dynamics [11][12][13], knowledge space [14], collaboration networks [7,8,15], and their association with societal impact [16,17]. Indeed, recent empirical work regarding knowledge flow and scientist mobility among academic fields have attracted much attention, including the field of artificial intelligence [18], subfields of physics [19][20][21], and science in general [22,23], together with the diffusion model [24].
Here we leverage the Microsoft Academic Graph (MAG) dataset to study the reference and citation patterns between complex system research and other academic fields, exploring their relationships. Consistent with the anecdotal evidence, we find that complex system research is inspired by mathematics and physics before 2000. Moreover, due to the rapid development of the computational technologies and the availability of large-scale datasets, complex system study has become largely connected with computer science afterwards. Finally, we show empirically that complex system study is inspired by a diverse set of disciplines compared with many other fields such as nuclear physics, and this broadness in the reference list tends to be associated with field's future scientific impact. Together, our study provides among the first empirical evidences that complex system research has multidisciplinary and computational nature. With the very nature of complex system research that may transfer to future scientific impact, our findings may have broad policy implications.

Results
e MAG dataset is from Microsoft Academic Services [25,26], which contains more than 100 million papers from 1800 to 2018 together with their citation relationships. Moreover, the MAG dataset uses the natural language processing to identify paper academic fields, resulting in a field tree with 19 coarse-grained (level 0) and more than 80,000 subfields (level 1 and level 2) spanning from biology to computer science. e citation relationships and paper field information, together with the sufficient time span, allow us to understand the evolution of scientific fields. Finally, as will be shown below, the MAG data is also useful to our study since it contains sufficient computer science papers, which are well related to complex systems (there are 70,335 complex system papers in the MAG dataset).

Knowledge Production and Citation Networks.
roughout the past 60 years, major technological changes and paradigm shifts such as the development of chaos theory and computer science result in a substantial knowledge production in the field of complex systems, including network science, data science, and computational social science [3,9,27]. To quantify the knowledge production, Figure 1(a) shows the yearly number of publications from 1960 to 2018. Consistent with prior studies [19,23], we find that science grows at an exponential rate, doubling every 12.3 years (Figure 1(a)). Complex system research also shows an exponential growth trend (Figure 1(a)). Comparing the knowledge production of complex systems and the whole science, we find that complex system research shows a steady growth relative to science. Specifically, the percentage of complex system paper grows from less than 0.01% to 0.06% from 1970 to the end of 2010 with a sudden burst between 2000 and 2010, representing the golden age of complex systems (Figure 1(b)). Hence, the field's exponential growth is not only driven by social needs, but also by paradigm changes, such as the development of computer science and artificial intelligence. e results also agree with the Stephen Hawking's famous quote mentioned in the beginning of the paper.
Following the substantial growth of the core complex system field, it is also interesting to study the growth of its related fields. e next question is as follows: How to detect fields that are related to complex systems? Sam Edwards remarked the definition of physics as "what physicists do" [28]. Following in his definition, we expand the field of complex systems to a broad perspective by including papers that cite or were cited by complex system papers. We find academic papers related to complex system research also showing a similar exponential growth pattern with a doubling time of 10.33 years (Figure 1(a)). One may argue that papers that cite or were cited by complex system papers may not belong to its related fields, since such citation relationships might come from noise or random connections in the whole citation network. To address this issue, we compare the citation patterns of all MAG papers to a null model in which each paper's references are assigned randomly given their publication time, regardless of a paper's journal or research field [16]. Specifically, in the randomized MAG citation network, we switch all citation links between papers but preserve the total number of references of each paper and the year of referencing and referenced papers [16]. After that, we define papers related to complex systems if its relationships (references or citations) to complex system field are significantly higher than expected by chance [19]. Followed by the whole procedure, we end up with more than 1.2 million papers that are related to complex systems, showing consistent results with Figure 1(a). e steady growth of complex systems and its related fields raises an interesting question: Where does complex system research locate in the scientific knowledge space?
To visualize the knowledge graph, we extracted citation relationships among different fields to construct a field citation network (here, we use the MAG level-2 fields since complex system research belongs to this level). In the network, each node represents an academic field, and links between different fields are referencing or citation behaviors. Additionally, we consider the number of citations/references between two fields as the link weight in the citation network. To eliminate noise and insignificant links, we applied a backbone extraction method to the whole field citation network [29]. While the citation network often shows strong community structure, with subfields belonging to the same ancestor cite each other much more frequently than those with different ancestor fields (Figures 1(c)-1(e)), many fields show strong 2 Complexity Year  Complexity interdisciplinary features. For example, chemistry lies between physics and biology, while engineering stands in the intersection of physics, mathematics, and computer science.
Looking across different periods, we find complex system research moved from mathematics in the 1980s to computer science in the 2000s (Figures 1(c)-1(e)), indicating a substantial paradigm shift towards computational research [30].
Our findings are consistent with the fact that the development of computer science as well as the availability of large-scale datasets has catalyzed complex system research, possibly the emergence of network science as well. e knowledge production by the field of complex system, together with its position in the whole knowledge graph, prompts us to ask the following: Which specific field is closely related to complex systems and how to quantify the relationships between various fields rather than inspections?

Knowledge Flow and Its Related Fields.
What are the origins of complex system research? Where does the knowledge created by complex system research go? e "linear model" of science demonstrates the importance of basic science to the development of applied research and technological development [31]. From the history of complex system field, its study of nonlinear dynamics or chaos originated from mathematics, atmospheric science, and physics deepens our understanding of its underlying mechanisms. Also, the development of complex system study contributes to the emergence of network science, data science, and computational social science. erefore, we hope to understand the relevant knowledge flow. Does complex system theory emerge from mathematics or physics? Do computer science and engineering apply methodologies developed by the field of complex systems?  for the whole Microsoft Academic Graph (green) and the field of complex systems (red) and its related fields (blue). We fit each growth pattern using an exponential function, shown in dashed lines. (b) e fraction of yearly number of papers of complex system relative to the whole MAG, together with a linear fit. Field citation networks in the (c) 1980s, (d) 1990s, and (e) 2000s, with nodes being MAG level-2 fields and links of the citation/reference relationships between two fields. In order to eliminate insignificant links, we applied a backbone extraction method. Different colors represent ancestor fields of these level-2 fields. We also annotate the field of complex system in the citation network. 4 Complexity To investigate, we study the association between complex system research and various scientific fields through the relationship of references and citations. Specifically, we follow the definition of related papers to complex system research, and Figure 2(a) shows the distribution of their academic fields. We find that computer science and mathematics are the top two most frequent fields related to complex system research, followed by the fields of biology, chemistry, and physics. Other fields like philosophy or art are less likely to have connections with complex system study (Figure 2(a)).
To eliminate the effect of exponential growth of the paper production (Figure 1(a)), we calculate the share of references made from complex system study to other academic fields, as well as papers in other fields to complex system study. Following previous research [18,19], the reference share from field A to field B is defined as #refs from A papers to B papers in year t #refs made by A papers in year t .
(1) e reference share measurement controls the number of references made by field A, thus controlling for its knowledge production (Figures 2(b) and 2(d)). Additionally, in order to control the paper production in field B that is the referenced field, we also compare the real number of references between fields A and B with the same quantity in the randomized citation network as defined in the previous section. Specifically, we focus on the z-score between the two fields as follows: where μ A, B and σ A, B represent the average number and the standard deviation of connections between fields A and B generated from multiple randomizations, respectively (Figures 2(c) and 2(e)). A large z-score means fields A and B are more likely to be connected with respect to the null model. Before 2000, complex system research frequently cites papers from mathematics and physics (Figure 2(b)); we find similar results after controlling for the paper production in referenced field (Figure 2(c)), suggesting early complex system study was shaped by mathematics and physics. e result is also consistent with our intuition that complex system study is closely related to the discovery of dynamical system theory or nonlinear system and nonequilibrium thermodynamics [32]. After 2000, however, complex system research strongly relies on computer science, indicating the paradigm shifts to computational research, such as data science and computational social science [33]. Interestingly, the references from complex system to chemistry and biology are largely driven by the dominance fraction of chemistry/biology papers with respect to the whole science. In fact, complex system research is less likely to cite biology and chemistry compared to the null model (Figure 2(c)). How does complex system research affect other fields? We repeat the above analysis by calculating the reference share and z-score from other fields to complex system research. Several insightful patterns emerge. First, complex system research also affects mathematics and physics significantly before 2000, while computer science substantially cites complex system research after 2000 with steady growth pattern (Figures 2(d) and 2(e)). e results suggest that, in terms of knowledge flow, complex system research does not only act as a target but also an important input. Interestingly, engineering depends strongly on the development of complex system research after 2005 (Figure 2(e)), suggesting the applied value of complex system research. e above analysis allows us to uncover the relationships between complex system study and other academic fields, prompting us to aggregate multiple years in order to investigate whether complex system study affects other fields more or vice versa. Figure 3 shows the z-score defined above for three different periods, i.e., 1980s, 1990s, and 2000s. By comparing z-score from complex system field to other fields ( Figure 3, red bars) with the same quantity from other fields to complex system research ( Figure 3, blue bars), we find the z-score from complex system to the field of mathematics is higher than the other way around, suggesting that complex system research consistently affects mathematics more from the year of 1980 (Figure 3(a)). Computer science, however, shows comparable knowledge flow from and to complex system research (Figure 3(b)), suggesting an extensive convolution structure between these two fields. Interestingly, engineering is more affected by complex system research since 2000, indicating the applied value of complex system research (Figure 3(e)). Overall, the framework of knowledge flow allows us to uncover the statistical regularities of interactions between various scientific fields.

Reference and Impact
Broadness. Inspired by the results above, we discuss, in this section, the major characteristics of complex system research, and questions we would like to answer are as follows: Does complex system research make references from a diverse set of fields? What is the impact of complex system research?
In order to answer these questions, we quantify the broadness of disciplines reflected in the references and citations for every paper that belongs to the same academic field using the entropy measurement [34]. Specifically, given a paper with a reference or citation list, we embed each reference/citation to a vector of academic fields d α � (d α1 , d α2 , . . . , d αk ) based on the MAG level-1 fields. en, the ith element of the paper is given by where δ is the delta function with 1 when d αβ � i and 0 otherwise, k is the number of MAG level-1 fields a reference/ citation belongs to, n is the number of references for a given paper, and α and β are the index looping from the paper's reference list and field information of each reference, respectively. After embedding every paper to a vector, we perform L1 normalization to each vector, denoted by p. We then calculate the normalized entropy for each paper as where N is the total number of level-1 fields in the MAG dataset. A paper with E � 1 reflects its references/citations' equal distribution across all fields (broad referencing and impact); a paper with E � 0 indicates the paper's references/citations are solely from one field (deeply disciplinary). Finally, we average the entropy over all papers within the same academic field to get the field reference or impact broadness. First, Figures 4(a) and 4(b) show the distribution of field reference and impact broadness at 1995, documenting the narrow range of both referencing and impact broadness that are in contrast to the fat-tail citations distribution [35]. Year (e) Figure 2: Knowledge between complex system study and other academic fields. (a) e distribution of academic fields for papers that cite or are cited by complex system research at least once. (b, c) e reference share and z-score (compared to the null model) from complex system research to other academic fields. (d, e) e same as (b) and (c) but from other academic fields to complex system research.
Interestingly, complex system research shows a relatively diverse feature among all academic fields, as shown in the dash line (Figures 4(a) and 4(b)). Moving beyond a single year, we repeat the same analysis for all academic fields from 1970 to 2018, finding similar results (Figures 4(c) and 4(d)). Comparing the reference or impact broadness of complex system research with the same quantities for nuclear physics, which is often considered as a deeply disciplinary field [19], we find that complex system research consistently shows higher value of reference or impact broadness (Figures 4(c) and 4(d)). Our results thus show a multidisciplinary nature of complex system research, suggesting that complex systems research may offer a language in which many academic fields can interact with each other without border.

Reference Broadness and Scientific Impact.
e multidisciplinary nature of complex system research encourages us to explore the association between field's reference broadness and its scientific impact, which has policy implications for funding agencies. To answer this question, we calculate, for each field, its reference broadness and its impact, which is measured by the average paper citations within 8 years of publication for a specific field (C8). We find a significant positive correlation between field reference broadness and the scientific impact (Figures 5(a) and 5(b)), suggesting that fields with more diverse reference lists are more likely to have more scientific impact in the future. To test the robustness of this association, we repeat the analysis from 1970 to 2008, finding consistent and significant results ( Figure 5(c)).
Finally, one might argue that different scientific fields often show different citation patterns. For example, biology and chemistry often have, on average, more citations than mathematics and engineering [35]. Moreover, citation behaviors change over time [11]. In order to eliminate this concern, we use a simple linear regression by focusing on the effect of reference broadness on field impact, while controlling for time, ancestor field categories, the number of publications within each field, and the average number of references. We find consistent positive correlation between the reference broadness and field's scientific impact ( Figure 5). To further eliminate the effect of the number of references, we did the same regression analysis for subsamples with similar number of references, finding consistent results. e results can be interpreted as follows: one standard deviation (SD) increase of reference broadness for a scientific field is associated with on average 13.9% increase of future scientific impact.

Conclusions
Many of us involve with reviewers for funding agencies or publishers may confront with questions like: Can a particular work belong to the framework of complex system research? What is the nature of complex system study? In this paper, we present an anatomy of complex system research spanning from 1960 to 2018 by leveraging a largescale bibliometric dataset. By investigating its knowledge production, connections with other academic fields, and its reference or impact broadness, our results show the very nature of complex system research, i.e., its multidisciplinary, computational or data driven, and applied nature. e empirical results are in line with many anecdotal evidences from textbooks or review articles [3,9]. Our results also go beyond case studies, which only focus on referencing/citation behaviors within a specific field. By investigating the connections between fields in the whole knowledge space, we systematically quantify the evolution of the focal field with respect to science. e significant positive association between fields' reference multidisciplinarity and their future scientific impact may have policy implications, from individual department to funding agencies and government. In general, our results are consistent with prior findings that isolations of certain scientific fields bring significant impact penalties [19]. In order to foster and nurture multidisciplinary science that is key to many real-world problems, funding agencies and government may support fields with multidisciplinary nature. From the perspective of individual scientist [20], our results provide actionable strategies of choosing future research areas [14,36,37] and mentors [38]. By choosing research areas that are able to link to their surrounding fields, individual scientists are able to show future scientific impact. Isolation to their environment, however, is destined to extinction [19].
Our understanding of the story that complex system research tends to be inspired by a diverse set of academic areas is not without limitation.
ere are still several remaining issues for future study. First, while this work considers large-scale citation relationships among scientific fields, there is a lack of other dimensions including scientist mobilities among various fields [20] or text similarities between different papers [39]. Such limitations offer a chance to go beyond our current understanding and ask further questions: how to integrate citations, individual scientists, as well as text feature among academic fields? Given that many important discoveries emerge from the intersections of various fields [16], the analysis of knowledge graph using large-scale dataset could help us uncover or predict emerging/promising research areas. Finally, with the intense connections between science and technology, future work is to investigate the association between fields' multidisciplinarity and their roles in advancing technologies.  Figure 5: e association between field's reference broadness and its scientific impact. e correlation between field reference broadness and its impact for 1995 (a) and 2005 (b). (c) e correlation as a function of time. (d) Linear regression results with dependent variable the field's scientific impact and the focal independent variable as field's reference broadness. Standard errors are clustered at the MAG level-0 field level. * * * p < 0.001, * * p < 0.05, and * p < 0.1.

Data Availability
is paper uses the MAG dataset, which is publicly available from https://aka.ms/msracad.

Conflicts of Interest
e authors declare that they have no conflicts of interest regarding the publication of this paper.