A New Quantitative Definition of the Complexity of Organized Matters

One of the most fundamental problems in science is to deﬁne the complexity of organized matters quantitatively, that is, organized complexity. Although many deﬁnitions have been proposed toward this aim in previous decades (e.g., logical depth, eﬀective complexity, natural complexity, thermodynamics depth, eﬀective measure complexity, and statistical complexity), there is no agreed-upon deﬁnition. The major issue of these deﬁnitions is that they captured only a single feature among the three key features of complexity, descriptive, computational, and distributional features, for example, the eﬀective complexity captured only the descriptive feature, the logical depth captured only the computational, and the statistical complexity captured only the distributional. In addition, some deﬁnitions were not computable; some were not rigorously speciﬁed; and any of them treated either probabilistic or deterministic forms of objects, but not both in a uniﬁed manner. This paper presents a new quantitative deﬁnition of organized complexity. In contrast to the existing deﬁnitions, this new deﬁnition simultaneously captures all of the three key features of complexity for the ﬁrst time. In addition,the proposeddeﬁnition is computable, is rigorously speciﬁed,and can treatboth probabilisticand deterministic forms of objects in a uniﬁed manner or seamlessly. The proposed deﬁnition is based on circuits rather than Turing machines and ε -machines. We give several criteria required for organized complexity deﬁnitions and show that the proposed deﬁnition satisﬁes all of them.


Introduction
1.1. Background. More than seven decades ago, an American scientist, Warren Weaver, classified scientific problems into three classes: problems of simplicity, problems of disorganized complexity, and problems of organized complexity [1]. For example, classical dynamics can be used to analyze and predict the motion of a few ivory balls as they move about on a billiard table. is is a typical problem of simplicity. Imagine then, a large billiard table with millions of balls rolling over its surface, colliding with one another and with the side rails. Although to be sure the detailed history of one specific ball cannot be traced, statistical mechanics can analyze and predict the average motions. is is a typical problem of disorganized complexity. Problems of organized complexity, however, deal with features of an organization such as living things, ecosystems, and artificial things. Here, cells in a living thing are interrelated into an organic whole in their positions and motions, whereas the balls in the above illustration of disorganized complexity are distributed in a helter-skelter manner.
In the tradition of Lord Kelvin, the quantitative definition of complexity is the most fundamental and important notion in problems of complexity [2].
"I often say that when you can measure what you are speaking about, and express it in numbers, you know something about it; but when you cannot measure it, when you cannot express it in numbers, your knowledge is of a meagre and unsatisfactory kind; it may be the beginning of knowledge, but you have scarcely in your thoughts advanced to the state of science, whatever the matter may be." Lord Kelvin, 1883 e quantitative definition of disorganized complexity of physical systems has been established to be entropy, which is defined in thermodynamics and statistical mechanics. In a similar manner, the disorganized complexity of information sources (distributions) can be quantitatively defined by Shannon entropy [3,4].
In contrast, there is no agreed-upon quantitative definition of organized complexity. e difficulty comes from the notion that organized complexity could be greatly dependent on our senses and the context or that the objects of organized complexity like living things, ecosystems, and artificial things may be recognized only by intelligent organisms such as human beings and they are often heavily dependent on the context in the environments (e.g., previous knowledge and understanding), that is to say, it is vastly different from the measures of disorganized complexity such as entropy that simply quantifies the randomness of the objects.
We may therefore wonder whether such sensory, vague, and context-dependent things can be rigorously defined in a unified manner covering various living things to artificial things. Many investigations nonetheless have been pursued toward this aim in the last decades, for example, logical depth by Bennett [5], effective complexity by Gel-Man [6][7][8], natural complexity by Gentili et al. [9][10][11][12], thermodynamic depth by Lloyd and Pagels [13], effective measure complexity by Grassberger [14], and statistical complexity by Crutchfield et al. [15][16][17][18], although no existing definition has been agreed on in the field [2,19,20].
In Section 4, we explain our understanding of why no existing definition is satisfactory to be agreed upon. Roughly speaking, these definitions have the following problems (Section 4.3).
(i) ey captured only a single feature among the three key features of complexity; descriptive, computational, and distributional features. e effective complexity (and Kolmogorov complexity [21]) captured only the descriptive feature; the logical depth captured only the computational; and the statistical complexity and effective measure complexity captured only the distributional. (ii) ey treated either a probabilistic or deterministic form of objects, but none of them could treat both forms of objects in a unified manner or seamlessly. (iii) Some definitions were not computable. (iv) Some definitions were not rigorously specified.
We then have the following question: Is there any quantitative definition of organized complexity with all of the following properties?
(i) It simultaneously captures all of the descriptive, computational, and distributional features (ii) It can treat both probabilistic and deterministic forms of objects in a unified manner or seamlessly (iii) It is computable (iv) It is rigorously specified 1.2. Contribution. In this paper, we positively answer this question.
e major results of this paper are as follows: (1) Summary is paper presents a new quantitative definition of organized complexity. In contrast to existing definitions, this definition simultaneously captures the three key features of complexity: descriptive, computational, and distributional features. It is also computable, is rigorously specified, and can treat both probabilistic and deterministic forms of objects in a unified manner or seamlessly.
(2) e Proposed Definition of Organized Complexity e proposed definition is based on circuits [22,23] rather than Turing machines [5-8, 21, 22] and ϵ-machines [15][16][17][18]. e new definition for an object (distribution) is given by the shortest size of a stochastic automaton form of circuit, oc-circuit, for simulating the object (i.e., it is based on Occam's razor principle and rigorously specified). Here, we show the proposed definition roughly (see Section 5.2 for the precise definition).
(a) e oc-circuit (stochastic automaton form of circuit) is defined as below. Let C be a circuit with N input bits and L output bits (C: 0, 1 { } N ⟶ 0, 1 { } L ), and let C be a specified form of C. (C, u, n, (m 1 , . . . , m K )) is oc-circuit C such that s i+1 , y i ⟵ C(u, ·) ⟵ s i , m i , r i , i � 1, 2, . . . , K, Y ≔ y 1 , . . . , y K n , y 1 , . . . , y K n is the n-bit prefix of y 1 , . . . , y K , (see Section 1.3) where s i is a state at step i, u is an a priori input (universe), m i is an input at step i, r i is random bits at step i, K ≔ ⌈n/L⌉, and L � |y i | (bit length of y i ) for all i (Definition 1). (b) Organized complexity OC of an object, distribution X over n bit strings, at precision level δ (0 ≤ δ < 1) is defined as follows: where C is an oc-circuit, and X δ ≈ Y⟵ R C means the output (simulation result) Y of occircuit C is close (with precision level δ) to object X (|C| denotes the bit length of C ). at is, the organized complexity of object X is the shortest size of oc-circuit C that simulates X (with precision level δ; Definition 2).
(3) Properties of the Proposed Definition (See Section 5.3 for the Precise Descriptions) (a) For any distribution X over 0, 1 { } n (n ∈ N) and any precision level δ > 0, (the proposed definition of organized complexity) OC(X, δ) can be computed ( eorem 1). (b) Any ϵ-machine can be simulated by an oc-circuit (as a special case with no descriptive nor computational feature; eorem 2). (c) We introduce a structured variant of the proposed definition, OC S (X, δ), for object X and 2 Complexity precision level δ (Definition 3) and show the following properties: (i) For any distribution X over 0, 1 { } n (n ∈ N) and any precision level δ > 0, OC S (X, δ) can be computed ( eorem 3). (ii) Let OC S NAND { } (X, δ) be defined on NANDbased circuits, while OC S (X, δ) is defined on (AND, OR, NOT)-based circuits. en, for a constant O(1) in n, ( eorem 4) (4) Comparison We give five criteria required for organized complexity definitions in Section 3.3 and summarize the comparison of our definition (Section 5) with the existing definitions (Section 4) in Table 1 in terms of the criteria and Table 2 in terms of the approach and machinery. ( e comparison is described in Sections 4.3 and 5.1.) As shown in Table 1, the proposed definition satisfies all of the criteria for the first time. As shown in Table 2, the proposed definition is based on Occam's razor principle and the oc-circuit machinery. As a result, it is rigorously specified, is computable, and captures all three key features of complexity.

Notations.
e sets of natural, rational, and real numbers are denoted by N, Q, and R, respectively. e set of n-bit strings is denoted by 0, 1 { } n (n ∈ N) and 0, 1 { } * ≔ ∪ n∈N 0, 1 { } n , and the null string (0 bit string) is denoted by λ. When x ∈ 0, 1 { } * , |x| denotes the bit length of . When x ∈ R, ⌈x⌉ denotes the smallest integer greater than or equal to x.
When x is a variable and y is a value, x ≔ y denotes that x is substituted or defined by y. A probability distribution over 0, 1 When A is a probability distribution, or the source (machinery) of the distribution, a⟵ R A denotes that element a ∈ 0, 1 { } n is randomly selected from A according to its probability distribution. When A is a set, a⟵ U A denotes that a is randomly selected from A with a uniform distribution. When X and Y are two distributions, the statistical distance of X and Y, SD(X, Y), is defined by that SD(X, Y) is bounded by δ. en we say X and Y are statistically δ-close. When Y is a distribution, (Y) n denotes the n bit restriction (n bit prefix) of Y, that is, (Y) n is a distribution over 0, 1 { } n and Pr[y⟵ When S is a set, #S denotes the number of elements of S.

Objects
e existing complexity definitions can be categorized into two classes. One is a class of definitions whose objects are deterministic strings, and the objects of the other class of definitions are probability distributions. As for the traditional complexity definitions, the Kolmogorov complexity is categorized in the former, and entropy is in the latter. Among the above-mentioned organized complexity definitions, logical depth and effective complexity are in the former, and thermodynamic depth, effective measure complexity, and statistical depth are in the latter.
Which is more appropriate as the objects of organized complexity? e objects of complexity are everything around us, stars and galaxies in space, living things, ecosystems, artificial things, and human societies. e existence of everything can be recognized by us only through observations. For example, the existence of many things are observed through devices such as the telescope, microscope, and various observation apparatus and electronic devices. We can take things around us directly into our hands and sense them, but they are also recognized by our brains as electronic nerve signals transmitted from the sensors of our five senses through the nervous system. at is, all objects of complexity are recognized as the result of observations by various apparatus and devices, including the human sensors of our five senses.
Since quantum mechanics governs the microworld, observed values are determined in a probabilistic manner.
is is because observed values (data) obtained when observing microphenomena (quantum states) in quantum mechanics are randomly selected according to a certain probability distribution corresponding to a quantum state (e.g., entangled superposition).
How, then, are observed values in macrophenomena? For example, if some sort of radio signals are received and measured, they would almost certainly be accompanied by noise. ere are various reasons why noise becomes mixed in with signals, and one of them is thought to be the probabilistic phenomena of electrons, thermal noise. Similarly, various types of noise will be present in the data obtained when observing distant astronomical bodies. is can be caused by the path taken by the light (such as through the atmosphere) and factors associated with the observation equipment.
Even in the case of deterministic physical phenomena, chaos theory states that fluctuations in initial conditions can lead to diverse types of phenomena that behave similarly to those of random systems. In short, even a deterministic system can appear to be a quasi-probabilistic system. However, even a system of this type can become a true (nonquasi) probabilistic system if initial conditions fluctuate due to some noise, for example, thermal noise. ere are also many cases in which a quasi-probabilistic system associated with chaos cannot be distinguished from a true probabilistic system depending on the precision of the observation equipment. Here, even quasi-probabilistic systems may be treated as true probabilistic systems.

Complexity 3
us, when attempting to give a quantitative definition of the complexity of observed data, the source of those observed data would be a probability distribution, and the observed data themselves would be randomly selected values according to that distribution. If we now consider the complexity of a phenomenon observed using certain observation equipment, the object of this complexity should not be the observed data selected by chance from the source but rather the probability distribution itself corresponding to the source of the observed data.
It is known that some parts of genome patterns appear randomly distributed over a collection of many samples (over generations). Here, we can suppose a source (probability distribution) of genome patterns from which each genome pattern is randomly selected. Also, in this case, the object of complexity should not be each genome pattern but rather the source (probability distribution) of the genome patterns. erefore, hereafter, we consider that an object of complexity is a probability distribution in this paper. Here, note that we treat a deterministic string as an object since a deterministic string is a special case of a probability distribution (where only a value occurs with probability 1 and the others with 0).
How can we identify a source or a probability distribution from observed data? It has been studied as the model selection theory in statistics and information theory, for example, AIC (Akaike's information criterion) by Akaike [24] and MDL (minimum description length) by Rissanen [3,25,26]. Here, given a collection of data, find the most likely and succinct model (source, i.e., probability distribution) of the data.
In this paper, however, it is outside the scope, that is, we do not care how to find such a source from a collection of observed data (through the model selection theory). We here suppose that a source (probability distribution) is given as an object of organized complexity and focus on how to define the complexity of such a given source quantitatively.
ere are roughly two types of observed data: one type is data observed at a point in time, and the other is time-series data. Genome pattern data are examples of the former, and data obtained from an observation apparatus for a certain period are examples of the latter. In any case, without loss of generality, we here assume that observed data x is bounded and expressed in binary form, that is, x ∈ 0, 1 { } n for some n ∈ N, since any physically observed data have only finite precision (with rounding error). en the source of the observed data, X, which is an object of organized complexity in this paper, is a probability distribution over 0, 1

Criteria
In this section, we will show the criteria required for the quantitative definitions of organized complexity.

Examples.
To begin with, let us consider the following example. We give a chimpanzee a computer keyboard and prompt the chimpanzee to freely hit the keys, resulting in a string of characters. Let us assume an output of 1,000 alphabetical characters. At the same time, we select a string of  4 Complexity 1,000 characters from one of Shakespeare's plays. Naturally, the character string input by the chimpanzee is gibberish possessing no meaning, which undoubtedly makes it easy for us to distinguish that string from a portion of a Shakespearean play. However, is there a way to construct a mathematical formulation of the difference between these two strings that we can easily tell apart? Why is it so easy for us to distinguish these two strings? e answer is likely that the chimpanzee's string is simply random (or disorganized) and meaningless to us, while Shakespeare's string is highly organized and meaningful. In short, if we can mathematically define the amount of organized complexity (or meaningful information), we should distinguish between these two strings.
What then are the sources of the observed data, the chimpanzee's string and Shakespeare's string. Let us return to the source of the chimpanzee's string creation without thinking of it as simply a deterministic string.
Here, for simplicity, we suppose that the scattered hitting of keys by the chimpanzee is the same as a random selection of hit keys. At this time, the source of the chimpanzee's string is the probability distribution in which any particular 1,000-character string can be randomly selected from all possible 1,000-character strings with equal probability.
What, then, would be the source of Shakespeare's string? We can surmise that when Shakespeare wrote down this particular 1,000-character string, a variety of expressions within his head would have been candidates for use and that the 1,000-character string used in the play would have finally been selected from those candidates with a certain probability. e candidates selected must certainly be connected by complex semantic relationships possessed by English words. Such complex semantic relationships of human language have been extensively studied by the modeling with complex networks [27]. Accordingly, the source of Shakespeare's string must be the complex probability distribution of candidate expressions connected by complex semantic relationships.
Here, note that the source of Shakespeare's string (distribution of candidate expressions within his head) is an imaginary example of distribution. As we will observe below, we may easily perceive the features of the organized complexity in such an example. As mentioned in Section 2, this paper supposes a source is given and focuses on defining the complexity of the given source. Hence, it is outside the scope of this paper how to find the source (distribution) of Shakespeare's strings (for practical applications of our complexity definition, however, we may need to consider how to identify a source. See Section 6.3.) We now consider three imaginary examples, Cases 1, 2, and 3. In Case 1, candidate expression 1 has the probability of 0.017, candidate expression 2 has the probability of 0.105, . . ., candidate expression 327 has the probability of 0.053. e other expressions have the probability of 0, that is, hundreds of candidate expressions occurred in his head consciously or unconsciously. Finally, one of them was randomly chosen according to the distribution. As more simplified cases, Case 2 is where candidate expression 1 has the probability of 2/7, candidate expression 2 has the probability of 5/7, and the others have the probability of 0, that is, only two candidate expressions occurred in his head. Finally, one of them was randomly chosen. Case 3 is where only a single expression has the probability of 1 and the others have the probability of 0, that is, a deterministic string case; he selected the expression without hesitation. ese expressions and distributions should be highly organized and structured with complex semantic relationships.

ree Key Features.
We can find three key features, descriptive, computational, and distributional features, in the examples of Shakespeare's string above.
Case 1 may have a highly complicated distribution with complex semantic relationships. e distributional feature of complexity captures such distributional complexity. In contrast, Case 3 has no such feature since it is deterministic. (Note that the distributional feature of complexity for distribution can be represented by the statistical complexity, Section 4.2.3, with an ϵ-machine.) Instead, Case 3 (and Cases 1 and 2) may have two other features, descriptive and computational ones. e descriptive feature of complexity captures how much description (information or knowledge) is needed to produce an expression. For example, Shakespeare should need a tremendous amount of knowledge about culture, history, and language to write his plays. Its complexity (amount of knowledge) can be characterized as the descriptive feature. e complexity of the plays' semantic information (story, etc.) can also be characterized as the descriptive feature. (Note that the descriptive feature of complexity for a deterministic string can be represented by the Kolmogorov complexity, Section 4.1.1, with a Turing machine.) e computational feature of the complexity of an expression captures how much amount of computation (work) is needed to produce the expression. Suppose that Shakespeare came up with an idea of what he wanted to express. He then tried to find a proper literary expression or rhetoric. It might be easy (less thinking) for him in some cases and might require much work (hard thinking) in other cases. As mentioned above, we can analyze complex semantic relationships of human language with complex network models [27]. To find a proper expression with complex semantic relationships can be modeled as solving a mathematical problem in complex networks for human language. We can consider such computational complexity (to find a proper expression or to solve a mathematical problem in complex networks) as the computational feature. (Note that the computational feature of complexity for a deterministic string can be represented by the logical depth, Section 4.1.2, with a Turing machine.) In general, the organized complexity of an object (e.g., Case 1) can have all three key features, descriptive, computational, and distributional features. erefore, a desirable quantitative definition of organized complexity should capture these three features simultaneously.

Criteria for the Quantitative Definition of Organized
Complexity. Considering the observation mentioned above, we provide the following criteria for specifying the quantitative definition of organized complexity.
(1) Object (Prob. and Det.) e objects of the complexity definition should be probability distributions. In addition, the complexity definition should treat any deterministic strings (as a special case of distributions) and any general distributions in a unified manner or seamlessly. For example, it should treat Cases 1 and 2 (probabilistic distributions) and Case 3 (deterministic strings) of Section 3.1 in a unified manner or seamlessly.

(3) Simply Random
Simply random objects, "problems of disorganized complexity" by Weaver [1], for example, the source of the chimpanzee's string, should have low-organized complexity.

(4) Key Features of Organized Complexity
Highly organized objects, "problems of organized complexity" by Weaver [1], for example, the source of Shakespeare's string (Cases 1, 2, and 3), should have high-organized complexity. In addition, the organized complexity definition should simultaneously capture all of the three key features (descriptive, computational, and distributional) of complexity.

(5) Computability
Given an object, the organized complexity of the object should be computable (or recursive in the computation theory).

Existing Quantitative Definitions of Complexity
We now survey the typical quantitative definitions of organized complexity in the literature using the criteria above.
Here, we can categorize them into two classes. One is a class of definitions whose objects are deterministic strings, and the objects of the other class of definitions are probability distributions.
Roughly, the Kolmogorov complexity of string x is the size of the shortest program (on a Turing machine) to produce string x, that is, it is based on Occam's razor principle (see (iv) in Section 4.3) and rigorously (uniquely) specified. is definition captures the descriptive feature of the complexity of deterministic strings.
More precisely, let U be a reference universal prefix Turing machine (see [21] for the reference universal prefix machine). en, the Kolmogorov complexity, Note that the Kolmogorov complexity is a quantitative definition for disorganized complexity or the degree of randomness for deterministic strings (the counter notion of entropy for distributions). e criteria in Section 3.3 show that the Kolmogorov complexity has the following properties (summarized in Tables 1 and 2 [5] introduced logical depth with the intuition that complex objects are those whose most plausible explanations describe long causal processes. To formalize the intuition, Bennett employs the methodology of algorithmic information theory, the Kolmogorov complexity. e logical depth can be considered as one of the (computational) resource-bounded variants of the Kolmogorov complexity [21]. e logical depth of a deterministic string, Bennett's definition for measuring organized complexity, is dependent on the running time of the programs that produce the string and whose length is relatively close to the minimum in a sense. is definition captures the computational feature of the complexity of deterministic strings.
More precisely, the logical depth of string x at signifi- where we define m t (x) and m(x) by 6 Complexity that is, it is based on Occam's razor principle (see (iv) in Section 4.3) and rigorously (uniquely) specified. Here, U is the reference universal prefix Turing machine (for the Kolmogorov complexity), and U t is a specific class of U whose running time is bounded by t steps. l(p) is the length of program p. In terms of the criteria shown in Section 3, the logical depth has the following properties (summarized in Tables 1  and 2 [31] and Kt complexity [21,32], have the same drawback as the logical depth because they also focus on the computational feature and cannot capture the descriptive feature of complexity. (5) Computability: e logical depth of an object (string) is not computable because it is based on the Kolmogorov complexity or Turing machines (bad).

Effective Complexity.
Effective complexity [6,7] was introduced by Gell-Mann and is based on the Kolmogorov complexity. To define the complexity of an object, Gell-Mann considers the shortest description of the distribution in which the object is embedded as a typical member. Here, "typical" means that the negative logarithm of its probability is approximately equal to the entropy of the distribution. e effective complexity captures the descriptive feature of complexity since it is defined by the Kolmogorov complexity of the distribution induced by an object (deterministic string). e effective complexity of string x is where K(E) is the Kolmogorov complexity of distribution E, that is, the length of the shortest program to list all members, r, of E together with their probabilities, Pr E (r), and H(E) is the Shannon entropy of E, that is, it is based on Occam's razor principle (see (iv) in Section 4.3) and rigorously (uniquely) specified. In terms of the criteria shown in Section 3, the effective complexity has the following properties (summarized in  Tables 1 and 2).
(1) Object (Prob. and Det.): e objects are only deterministic strings (bad). Technically, however, we can consider distribution E to be an object of the effective complexity in place of x.
(2) Simply Regular: Simple (or very regular) objects have low effective complexity (good).

Natural Complexity.
ere are various definitions and frameworks of natural complexity in the literature [9,11,12,[33][34][35][36][37]. Here, we focus on the definition in [9] since it has the most general form of natural complexity among them. e natural complexity (NaC) [9] is a multivariate function that depends on the following: (i) e multiplicity (Mu) of the network, which is the number N of nodes. e nodes are described by the temporal functions of the type: (ii) e interconnection (Ic) of the network, which is the number L of links among the nodes. All the potential edges can be represented by the N × N adjacency matrix A that describes the system's wiring and the interaction strength between the nodes. (vi) e variability of links (VaL), which depends on how much the coefficients of the adjacency matrix A (i.e., a 11 (t), a 12 (t), . . . , a NN (t)) change over time. (vii) e integration (Ig) of the nodes' and links' features, which gives rise to those properties that are called emergent because they belong to the entire network.
As has been sustained in the previous paragraph, the emergent properties depend on the network's architecture. e network's topology can be inferred from parameters such as the degree distribution (P(d)), the clustering coefficient (C) of the nodes, and the mean path length (sp) of the network. Furthermore, the emergent properties depend on the environment surrounding the system as proved by the Darwinian evolution of life [38].
In synthesis, natural complexity (NaC) results to be a seven-variable function of the type: Although the natural complexity (NaC) provides a good framework for studying the complexity of various systems expressed with network forms as shown in [9] and Section 6.2, it has the following problems as a quantitative complexity definition.
(1) NaC is a framework of definitions but not a rigorous definition because there is no concrete description/ specification of f, DiN, DiL, VaN, VaL, and Ig (P(d), C, sp, etc.) and there is no way to compute a concrete value of NaC given an object (network expression). Even if NaC were concretely specified (by specifying f, DiN, DiL, VaN, VaL, and Ig (P(d), C, sp, etc.)), it would be ad hoc, and there would be many other concrete specifications of NaC. erefore, it would be hard to specify NaC uniquely.
Note that this drawback is shared with the thermodynamic depth to be introduced below, which is not rigorously specified due to the arbitrariness of macroscopic states. (2) e object of NaC is restricted to network expressions but not general. Network expressions have been used broadly for the characterization and analysis of complex systems. However, individual models or more direct analyses for complex systems rather than network expressions have also been employed [10,[39][40][41]. erefore, a complexity definition is desirable to give no particular restriction on the objects. Otherwise, we would need many complexity definitions depending on different classes of objects, for example, network expressions and other individual models.
Here, note that if the object of a complexity definition is general (as described in pointer (1) of Section 3.3), it can treat any particular class of the object, for example, deterministic network expressions, as shown in Section 6.2. at is, if the object of a complexity definition is general, it can be applied to any particular class of the object, but if the object is restricted, it cannot be applied to objects outside the restriction. erefore, the natural complexity (NaC) is not rigorously specified, and it is not easy to characterize the definition in terms of the criteria in Section 3 (summarized in Table 2).

Objects Are Probability Distributions
e thermodynamic depth was introduced by Lloyd and Pagels [13] and shared some informal motivation with logical depth, where complexity is considered a property of the evolution of an object.
We now assume the set of histories or trajectories that result in object (distribution) S 0 . A trajectory is an ordered set of macroscopic states (distributions) S −L−1 , . . . , S −1 , S 0 . e thermodynamic depth of object S 0 is where H(A, . . . , B|C) is the conditional entropy of combined distribution (A, . . . , B) with condition C. e thermodynamic depth has the following problems as a quantitative complexity definition.
(1) e thermodynamic depth is not rigorously defined.
is is because it is not defined how long the trajectories (what value of L ) should be, and there is no description on how to select macroscopic states in the definition [13]. If there are thousands of possible sets of macroscopic states, we would have thousands of different definitions of the thermodynamic depth.
(2) e object is essentially limited by the selection of macroscopic states. In order to define the complexity of an object S 0 , a set of macroscopic states S whose complexity is comparable to or more than that of S 0 should be established beforehand. Hence, if S is fixed, or the thermodynamic depth of S is concretely defined, it cannot measure the complexity of an object whose complexity is more than that of S. at is, any concrete definition of this notion can measure only a restricted subset of objects, that is, any concrete and generic definition is impossible in the thermodynamic depth.
erefore, the thermodynamic depth is not rigorously specified, and it is not easy to characterize the definition in terms of the criteria in Section 3 (summarized in Table 2).

Effective Measure Complexity.
e effective measure complexity was introduced by Grassberger [14] and measures the average amount by which the uncertainty of a symbol in an (infinite) random string (distribution) decreases due to the knowledge of previous symbols. e effective measure complexity captures the distributional feature of complexity.
For distribution X N over 0, 1 Complexity is difference quantifies the perceived randomness, which, after further observation, is discovered to be order [19]. It is information theoretically and rigorously (uniquely) specified.
In terms of the criteria shown in Section 3, the effective measure complexity has the following properties (summarized in Tables 1 and 2). ganized objects that focus on the descriptive or computational feature (e.g., highly organized complex objects (distributions), X N N∈N , that are almost deterministic and generated from (compressed) long strings or from small strings through very long running-time computation), their effective measure complexities are low since they are almost deterministic (bad). In other words, the effective measure complexity cannot capture the descriptive and computational features of complexity, while it captures the distributional feature. (5) Computability: e effective measure complexity of an object (distribution) is computable since it is information theoretically defined and not based on a Turing machine (good).

Statistical Complexity.
e statistical complexity was introduced by Crutchffeld and Young [15]. Here, to define the complexity, the set of causal states S and the probabilistic transitions between them are modeled in the so-called ϵ-machine, which produces a stationary distribution of causal states, D S . e mathematical structure of the ϵ-machine is a stochastic finite-state automaton or hidden Markov model. e statistical complexity of object (distribution) is the minimum value of the Shannon entropy of D S , where the object is equivalent to the output of the ϵ-machine, that is, it is based on Occam's razor principle (see the footnote in Section 4.3) and rigorously (uniquely) specified because the minimum value of the Shannon entropy corresponds to the shortest size of expression. It captures the distributional feature of complexity.
Let S i for i � 1, .., k be causal states, S ≔ S 1 , . . . , S k , and T ij be the probability of a transition from state S i to state S j , that is, T ij ≔ Pr[S j |S i ]. Each transition from S i to S j is associated with an output symbol, σ ij (e.g., σ ij ∈ 0, 1 { }). en, Pr[S i ], the probability that S i occurs in the infinite run of the ϵ-machine, is given by the eigenvector of matrix T ≔ (T ij ), since . Hence, the machine produces a stationary distribution of states, D S . e output of the ϵ-machine is the infinite sequence of σ ij induced by the infinite sequence of the transition of states.
at is, ϵ-machine outputs a distribution, Σ S , over 0, 1 { } ∞ , induced by D S . e statistical complexity of object X (distribution), denoted C 1 , is the minimum value of the Shannon entropy of D S , H 1 (D S ), when Σ S � X: A more generalized notion, C α (0 ≤ α ≤ ∞), is defined by the Reny entropy of D S in place of the Shannon entropy, that is, where C 1 is the case where α ≔ 1 as H 1 is the Shannon entropy and C 0 ≔ min log #S|Σ S � X (α ≔ 0; #S is the number of elements of set S; for α ≔ ∞, H ∞ is the minientropy). In terms of the criteria shown in Section 3, the statistical complexity has the following properties (summarized in Tables 1 and 2).
(1) Object (Prob. & Det.): Statistical complexity C α for any α (0 ≤ α ≤ ∞) can measure only probability distributions as objects, since for any deterministic string, there is only one state S 1 with this deterministic string and Pr[S 1 ] � 1, that is, C α � 0 (0 ≤ α ≤ ∞). us, none of C α can treat probability distributions and deterministic strings in a unified manner (bad).
(2) Simply Regular: Simple (or very regular) objects have low statistical complexity (good).
(3) Simply Random: Simply random objects have low statistical complexity (good). (4) Key Features of Organized Objects: For highly organized objects that focus on the descriptive or computational feature, their statistical complexities are low, that is, the statistical complexity cannot capture the descriptive and computational features (bad). Here, we show two typical cases below: (i) For highly organized complex objects (on the descriptive or computational feature) that are almost deterministic and are generated from (compressed) long strings (information) or from small strings through very long running-time computations), their statistical complexities are low. (ii) For a highly organized complex object (on the descriptive or computational feature) characterized by an ϵ-machine with (T, S, D S , σ ij ) that has low statistical complexity, C α , and that has σ ij generated from (compressed) long Complexity 9 strings (information) or from small strings through very long running-time computations), its statistical complexity is low. Note that C α depends on only D S but is independent of σ ij .
at is, the statistical complexity cannot capture the descriptive and computational features of complexity, while it captures the distributional feature. See Remark 3 for more precise observation about the properties of the statistical complexity. (5) Computability: e statistical complexity of an object is computable since it is information theoretically defined and not based on a Turing machine (good).

Summary.
In summary, the existing definitions have the following properties and shortcomings (see Tables 1 and 2): (i) Any existing complexity definition can capture only a single feature of complexity; for example, the Kolmogorov complexity and effective complexity capture only the descriptive feature, the logical depth captures only the computational, and the statistical complexity and effective measure complexity capture only the distributional. (ii) Any existing complexity definition can treat either a probabilistic or deterministic form of objects, but none of them can treat both in a unified manner or seamlessly. (iii) e definitions on Turing machines (Kolmogorov complexity, effective complexity, and logical depth) are not computable. e definitions not on Turing machines (the effective measure complexity and the statistical complexity) are computable, but they can only capture the distributional feature (but not descriptive and computational ones) due to the lack of computational machinery in the definitions. Note that an ϵ-machine for the statistical complexity, which produces a stochastic or hidden Markov process, is information-theoretical but not computational machinery. (iv) We can rigorously specify the complexity in Occam's razor principle and information-theoretical approaches (Kolmogorov complexity, effective complexity, logical depth, statistical complexity, and effective measure complexity), while the natural complexity (NaC) and the thermodynamics depth are not rigorously specified as described in Sections 4.1.4 and 4.2.1. Occam's razor principle can be described as "if presented with a choice between indifferent alternatives, then one ought to select the simplest one" [21]. We can specify a complexity definition on Occam's razor principle by adopting the shortest one (expression/time/entropy) for simulating the object. Due to the minimality principle, the definition can be specified uniquely.

Introduction of the Proposed Definition.
We now propose a new quantitative definition of organized complexity. is definition simultaneously captures the three key features of complexity: descriptive, computational, and distributional. It is also computable, is rigorously specified, and can treat both probabilistic and deterministic forms of objects in a unified manner or seamlessly. We give several criteria required for organized complexity definitions and show that the proposed definition satisfies all of them (Table 1). Roughly speaking, the proposed quantitative definition is given by the shortest size of a stochastic automaton form of circuit (oc-circuit) that simulates an object (distribution).
at is, in place of Turing machines and ϵ-machines, the proposed definition employs another class of computational machinery, a stochastic automaton form of circuit, oc-circuit. It is based on Occam's razor principle and rigorously specified ( Table 2).
We now clarify the differences between oc-circuit (circuit), ϵ-machine, and Turing machine. We also show the advantage of using oc-circuit (circuit) in terms of the three key features of complexity.

(i) oc-Circuit versus ϵ-machine
Our approach by oc-circuits is more general than the approach by ϵ-machines for the statistical complexity because an oc-circuit can simulate any ϵ-machine as a special case without the descriptive and computational features. In other words, an ϵ-machine is information-theoretical machinery to produce a stochastic process or hidden Markov process with only the distributional feature. In contrast, an oc-circuit is computational machinery to produce a distribution with all three key features, descriptive, computational, and distributional features ( eorem 2 and Remark 3). Another special case of oc-circuits lacks the distributional feature but has descriptive and computational features. Such an oc-circuit produces (simulates) a deterministic string. See pointer (1) in Section 5.5 for the description of such an oc-circuit.
Hence, oc-circuits include two typically specific cases. A special oc-circuit, ϵ-machine, produces information-theoretical distributions with only the distributional feature. Another is a special oc-circuit, pointer (1) in Section 5.5, which produces deterministic strings with only the descriptive and computational features. In general, an oc-circuit produces any distribution with the three key features. Hence, the proposed definition can treat both probabilistic and deterministic forms of objects in a unified manner or seamlessly. (ii) Circuit versus Turing machine e major difference between circuits [22,23] and Turing machines [22] is that a single (universal) Turing machine can compute any size of input, while a single circuit can compute a fixed size of input. In spite of the difference, any bounded time Turing machine computation with n bit input can be computed by a circuit with n bit input. e finiteness of a circuit (oc-circuit) yields the computability of our definition ( eorem 1). It contrasts with the incomputability of Turing machine-based definitions (Kolmogorov complexity, logical depth, and effective complexity) [21], which is due to the incomputability (undecidability) of the halting problem of Turing machines. (iii) Advantage of using circuits An advantage of basing on circuits is that it can simultaneously capture the three key features: descriptive, computational, and distributional features.
(a) Descriptive feature: An input on a Turing machine for a descriptive feature can be realized for a circuit with the input or hard-wire of the description. If a descriptive feature requires d bit input to a Turing machine, it requires d bit input/hard-wire for a circuit to describe it. Hence, if an object has a d bit descriptive feature represented on a Turing machine (e.g., the Kolmogorov complexity of the object is d), the size of a circuit producing the object should be Ω(d). en, the proposed complexity definition of this object (the shortest size of an oc-circuit simulating this object) should be Ω(d). erefore, the proposed complexity definition can capture the descriptive feature of complexity. (b) Computational feature: A t-time computation of a Turing machine can be computed by an O(t 2 )-size circuit [22]. If a computational feature requires t-time computation on a Turing machine, the size of a circuit to compute it for the computational feature would be Ω(t 2 ). Hence, if an object has a t-time computational feature represented on a Turing machine (e.g., the logical depth of the object is t), the size of a circuit producing the object would be Ω(t 2 ). We now suppose that the data compression by the t-time computation is greater than the computation overhead in terms of the circuit size. en, the proposed complexity definition of this object (the shortest size of an oc-circuit simulating this object) would be Ω(t 2 ). erefore, the proposed complexity definition can capture the computational feature of complexity. (c) Distributional feature: An oc-circuit includes an ϵ-machine structure as a partial structure, which characterizes a distributional feature (Remark 3). If an object has an s bit distributional feature represented on an ϵ-machine (e.g., the statistical complexity of the object is s), the size of an oc-circuit simulating this object should be Ω(s). erefore, the proposed complexity definition can capture the distributional feature of complexity.

Proposed Definition.
We now define a new measure of organized complexity. First, we define our computation model, oc-circuit.
Definition 1 (oc-circuit). Let circuit C with N input bits and L output bits be a directed acyclic graph in which every vertex is either an input gate of in-degree 0 labeled by one of the N input bits, or one from the basis of gates, B: � {AND, OR, NOT}. Among them, L gates are designated as the output gates. at is, circuit C actualizes a Boolean function: Hereafter, we abuse the notation of C to denote C, the canonical description of C.
Here, C is expressed by a specified form C ⊂ 0, 1 { } * , that is, C ∈ C iff C is a valid form of oc-circuit, and it can be efficiently (polynomial-time in |C|) verified whether C ∈ C. e output, Y, of C can be expressed by Y⟵ R C, that is, where the probability of distribution Y is taken over the randomness of K). en, C, u, and m → are called the "logic," "universe," and "semantics" of oc-circuit C, respectively.

Remark 1.
Circuit C of oc-circuit C is a probabilistic circuit, where uniformly random strings, r i ⟵ U 0, 1 { } N r for i � 1, . . . , K, are input to C and the output of C is distributed over the random space of r i i�1,...,K .
Here, note that r i i�1,...,K , which is an input to C, is not included in C, while the other inputs to C, u and m i i�1,...,K , are included in C. In other words, the size of the randomness, K i�1 |r i |, is ignored in the size of C or the definition of the organized complexity (see Definition 2), while Complexity N r and a part of C regarding the randomness are included in C. is is because the randomness, r i i�1,...,K , is just the random source of C's output distribution and has no organized complexity itself. Hence, simply random objects are characterized to have low-organized complexity based on the size of oc-circuit C (see pointer (3) in Section 5.5).
Definition 2 (organized complexity). Let X be a distribution over 0, 1 { } n for some n ∈ N. "Organized complexity" OC of distribution X at precision level δ (0 ≤ δ < 1) is where C ≔ (C, u, n, m → ) is an oc-circuit and |C| denotes the bit length of the binary expression of C (see Section 1.3 for the notations of δ ≈ ).
We call oc-circuit C X ≔ (C X , u X , n, m →X ) the shortest (or proper) oc-circuit of X at precision level δ, iff X δ ≈ Y⟵ R C X and |C X | � OC(X, δ). If there are multiple shortest occircuits of X, that is, they have the same bit length, the lexicographically first shortest one is selected as the shortest oc-circuit. en, C X , u X , and m →X are called the "proper logic," "proper universe," and "proper semantics" of X at precision level δ, respectively. Here,
Proof. For any distribution X ≔ (x, p x )|x ∈ 0, 1 { } n , p x ∈ [0, 1], x∈ 0,1 { } n p x � 1} and any precision level δ > 0, there (Here, note that p x ∈ R is changed to p x ′ ∈ Q provided that X ′δ ≈ X.) We then construct the truth table of Boolean function where the probability is taken over Such a function, f, can be achieved by setting truth table Since any Boolean function can be achieved by a circuit with basis B ≔ {AND, OR, NOT} [42], we construct circuit C * for oc-circuit C * with N s ≔ 1, N u � N m ≔ 0 (i.e., u � m i ≔ λ), N r ≔ ℓ, K ≔ 1, L y ≔ n, and s 1 � s 2 ≔ 0. at is, (0, y 1 ) ≔ C * (λ, 0, λ, r 1 ), and the output of C * is y 1 ∈ 0, 1 { } n with the same distribution as that of X ′ over the randomness of r 1 ⟵ U 0, 1 { } N r . at is, y 1 ⟵ R C * and We then exhaustively check all values of Z with |Z| < |C * | whether Z is a valid oc-circuit such that X δ ≈ Y⟵ R Z. Note that we can efficiently check whether or not Z is the correct form of an oc-circuit or Z ∈ C. Finally, we find the shortest one among the collection of Z (and C * ) satisfying the condition. Clearly, the size of the shortest one is OC(X, δ). □ Remark 2. As clarified in this proof, given object (distribution) X and precision level δ, the proposed definition of organized complexity uniquely determines (computes) not only organized complexity OC(X, δ) but also the shortest (proper) oc-circuit, C X , including proper logic C X , proper universe u X , and proper semantics m →X of X. In other words, the definition characterizes the complexity features of object X, that is, it characterizes not only organized complexity OC(X, δ) but also structural complexity features of X, for example, the descriptive feature by the size of u X and m →X and the computational and distributional features by the size of C X and (N r , N s ) of C X .

Relation between oc-Circuit and ϵ-Machine.
In the following theorem, we show that the notion of oc-circuit with the proposed organized complexity includes the ϵ-machine with statistical complexity introduced in Section 4 as a special case.

Theorem 2. Any ϵ-machine can be simulated by an oc-circuit.
Proof. Given ϵ-machine, ( S 1 , . . . , S k , (T ij ) i�1,..,k;j�1,..,k , (σ ij ) i�1,..,k;j�1,..,k ), we construct oc-circuit C ≔ ((C, N u , N s , N m , N r , L y , s 1 ), u, n, (m 1 , m 2 , . . .) such that N s ≔ ⌈log 2 k⌉ + 1 (i.e., S i ∈ 0, 1 { } N s ), N u ≔ 0, N m ≔ 0, N r ≔ max i,j |T ij | , L y ≔ max ij |σ ij | , s 1 ≔ S 1 (initial causal state), n ≔ ∞, u ≔ λ (null string), m ι ≔ λ (ι � 1, 2, . . .), and C is achieved to satisfy Pr[(S j , σ ij ) ≔ C(λ, S i , λ, r)] � T ij for i, j � 1, .., k, where |T ij | is the bit length of the binary expression of T ij , and the probability is taken over the randomness of r⟵ U 0, 1 { } N r in each execution of C. It is clear that the behavior of this oc-circuit with respect to the causal states is exactly the same as that for the given ϵ-machine. □ Remark 3 (features of the proposed complexity). Our complexity definition can treat more general cases with N u > 0 and N m > 0, while statistical complexity only considers a limited case with N u � 0 and N m � 0, that is, it ignores the descriptive feature as well as the computational feature.
For example, a sequence in a genome pattern that is common to all individuals is considered to be determined in the evolution process and has some biological meaning. e biological knowledge of DNA necessary to understand the DNA sequences can be captured by logic C and universe u of the oc-circuit C (where |u| � N u > 0), and the characteristic information (biological meaning) on each genome pattern ( δ ≈ Y⟵ R C) can be captured by semantics m → of C (where Our complexity definition also covers the computational feature, which is characterized by the size of logic C of occircuit C in our definition.

Structured Organized Complexity.
We can construct a circuit C using another basis of gates, for example, {NAND} and {AND, NOT}, in place of {AND, OR, NOT}. We express an oc-circuit using such a basis by C NAND { } and C AND,NOT { } , respectively. We also express the organized complexity of X using such a circuit by OC NAND { } (X, δ) and OC AND,NOT { } (X, δ), respectively. Based on such a different basis of gates, a natural variant of the proposed organized complexity, structured organized complexity, is given below.
Definition 3 (structured organized complexity). Let X be a distribution over 0, 1 { } n for some n ∈ N. Let C S ≔ (C S , u, n, m → ) be a structured oc-circuit, where C S � (macros, C macros , N u , N s , N m , N r , L y , s 1 ) and macros represents a set of macro gates (subroutine circuits) that are constructed from basis gates and that can be hierarchically constructed, where a level of macro gates are constructed from lower levels of macro gates. In addition, macros is notationally abused as the canonical description of macro gates in macros, which is specified in the same manner as that in the canonical description of a circuit. Term C macros is (the canonical description of ) a circuit constructed from basis gates B as well as macro gates in macros. Structured organized complexity OC S of distribution X at precision level δ (0 ≤ δ < 1) is Theorem 3. For any distribution X over 0, 1 { } n (n ∈ N) and any precision level δ > 0, OC S (X, δ) can be computed.
Proof. Given distribution X, we can construct structured occircuit C * in the same manner as that shown in the proof of eorem 1. Here, note that any (basic) oc-circuit C can be expressed as structured oc-circuit C S where macros ≔ λ, with slightly relaxing the format for structured oc-circuits, or to allow macros ≔ λ.
From the definition of OC S , for any value of 0 < δ < 1, We then, given δ, exhaustively check all values of Z with |Z| < |C * | whether Z is a structured oc-circuit such that X δ ≈ Y⟵ R Z. Finally, we select the shortest one among the collection of Z (and C * ) satisfying the condition. Clearly, the size of the shortest one is OC S (X, δ).
□ Remark 4. As described in Remark 2, the shortest structured oc-circuit with these parameters characterizes the properties of object X. It especially shows the optimized hierarchically structured circuit C S .
As mentioned above, we can construct a structured occircuit using another basis such as {NAND} and {AND, where X is a distribution over 0, 1 { } n and O(1) is a constant in n.
Proof. Let C S be a structured oc-circuit with macros that consists of the NAND gate macro from {AND, OR, NOT} as well as macros from {NAND} (i.e., macros is macros from {AND, OR, NOT}), and C macros be constructed only by NAND and macros. en OC S (X, δ) is upper bounded by |C S | for (X, δ), whose minimum value is OC S (1) Computational Distance: In Definitions 2 and 3, statistical distance is used for defining the closeness ≈ δ . We can replace this with the "computational" closeness ≈ (D ,δ) . Here, for two distributions, X and Y, over 0, 1 { } n , the computational closeness of X and Y is defined by where D is a class of machines D: 0, 1 { } n ⟶ 0, 1 { }. Intuitively, the computational closeness means that X and Y are indistinguishable at precision level δ by any machine in class D (the computational closeness/indistinguishability is widely used in cryptography [43]; for example, D is the class of probabilistic polynomialtime Turing machines and δ is a polynomially negligible function in the security parameter).
(2) Quantum Circuits: Circuit C in Definitions 2 and 3 can be replaced by a "quantum" circuit [44]. In this variation, we assume that the source of observed data is principally given by a quantum phenomenon.

Complexity 13
ere are several variations of the quantum complexity definition, typically: (1) all inputs and outputs of C are classical strings, (2) only state s i is a quantum string, and the others are classical, (3) all inputs and outputs of C except output y i are quantum strings, and (4) all inputs and outputs of C are quantum strings.
(3) Interactions: If an observation object actively reacts similar to a living thing, we often observe it in an interactive manner. So far in this paper, we have assumed that an object is a distribution that we perceive passively. We can extend the object from such a passive one to an active one with interactive observation. Suppose that the observation process is interactive between observer A and observation object B. For example, A first sends z 1 to B, which replies x 1 to A, and we continue the interactive process, z 2 , x 2 , . . ., z J , x J . Let Z ≔ (z 1 , .., z J ) and X ≔ (x 1 , .., x J ) be distributions. We then define the conditional organized complexity of X under Z with precision level δ, OC(X: Z, δ), which can be defined as the shortest (finite version of ) conditional oc-circuit to simulate X (with precision level δ) under Z. e extended notion of the organized complexity of interactive object (X, Z) can be defined by OC(X: Z, δ). (4) Generalization of the Size of Circuit: Some applications may employ a metric (e.g., depth) of a circuit in place of the size of a circuit [45]. We can introduce a variant of the proposed definition using another metric f of an oc-circuit in place of the oc-circuit size |C| such that

Evaluation in terms of the Criteria.
In terms of the criteria described in Section 3, the proposed definition has the following properties (summarized in Tables 1 and 2).
(1) Object: Prob. & Det. e proposed definition can treat probability distributions and deterministic strings (as special cases of distributions) in a unified manner as the objects (good). For example, an oc-circuit of pointer (3) in this section is a case for simulating a probability distribution (simply random), and an oc-circuit of pointer (2) in this section is a case for simulating a deterministic string (simply regular). More generally, for any deterministic string (as a special case of distribution), there exists an oc-circuit with circuit C to simulate the deterministic string by Y ≔ (y 1 , . . . , y K ) n ∈ 0, 1 { } n , such that s i+1 , y i ≔ C u, s i , m i , λ , i � 1, 2, . . . , K, (20) where N r � 0, that is, r i ≔ λ for i � 1, . . . , K. For example, any deterministic form of network as an object can be simulated (produced) by this occircuit with specifying C for the network (see Section 6.2 for an example of an oc-circuit to produce network expressions). (2) Simply Regular Simple (or very regular) objects have low complexity (good). For example, in a very regular case ("11 · · · 1" ∈ 0, 1 { } n ), the object can be simulated by an oc-circuit C ≔ (C, 1, n, λ) with C ≔ (C, 1, 1, 0, 0, 1, 0), such that where N u � 1(u � 1), s i ≔ 0 for i � 1, .., n + 1 (i.e., N s � 1), N m � N r � 0 (i.e., m i � r i � λ), and L y � 1. at is, input size N � 2 and output size L � 2, that is, C has two gates that are input gates labeled by (1,2) and that are also output gates.
(1, 2)), 0, 1, 0, 1, 1, 0), λ, n, λ), and OC(X) ≤ c + log 2 n, where c is a small constant. (4) Key features of Organized Objects As described in Section 5.1 and Remarks 2 and 3, the proposed definition simultaneously captures the distributional, computational, and descriptive features of organized complexity (good). Hence, our definition does not have the drawbacks of the existing definitions described in Section 4. at is, the proposed definition correctly estimates the complexity of highly organized objects for which an existing definition miss-estimates low. In addition, an object estimated by an existing definition high for a feature is also estimated high in our definition. (5) Computability e proposed complexity of any object (distributions/strings) is computable as shown in eorems 1 and 3 (good).

Applications of the Proposed
Complexity Definition is section will show several possible applications of the proposed complexity definition. Exploring the applications is one of the most important issues regarding this notion.

Applications and Approximations.
e Kolmogorov complexity is a quantitative definition for disorganized complexity (or randomness) of deterministic strings, and it is incomputable (Section 4.1). Nonetheless, many applications have been known [21], for example, theoretical background for inductive reasoning and compression theory, lower bound and average-case analysis of algorithms and computational and communication complexity, and information theory.
Several methods of approximating the Kolmogorov complexity have been developed for more practical applications [11,21,39,[46][47][48]. e methods of approximating the Kolmogorov complexity of an object (deterministic string) are closely related to the compression of the object since the Kolmogorov complexity is the size of an optimally compressed (shortest) program (on a Turing machine) to produce the object and its approximation is considered the size of a nearly optimal (program) compression to produce the object. For example, approximation methods based on data compression such as MDL (minimum description length) [3,25,26,49,50] were proposed [39,46]. Approximation methods based on the coding theorem [21], which implies a compression by random sampling, were explored [11,47,48] (which were criticized by [51]). However, Faloutsos et al. [39] emphasized the importance of selecting good models as an art rather than the automated methods on the data compression and the coding theorem.
In contrast to the Kolmogorov complexity, the proposed complexity definition (Section 5) is a quantitative definition for organized complexity of distributions (including deterministic strings as a special case), and it is computable. erefore, it should have more advanced applications than the Kolmogorov complexity, for example, (1) theoretical background for machine learning and artificial intelligence; (2) lower bound and average-case analysis of algorithms, networks, and computational and communication complexity regarding organized matters; and (3) semantic information theory.
Although the proposed definition is computable, computing it for an object generally requires at least an exponential time in the object size. Hence, for practical applications, it should be beneficial to explore efficient methods for approximating it.
Similarly to the Kolmogorov complexity, the methods of approximating the proposed complexity definition of an object (distribution) should be closely related to the (occircuit form) compression of the object, since the proposed complexity definition is the size of the optimally compressed (shortest) oc-circuit to simulate the object and its approximation is considered the size of a nearly optimal (oc-circuit form) compression to simulate the object.
Here, the approximation should be achieved by more sophisticated compression methods (considering the organized structure of the object) than just a randomness compression for the Kolmogorov complexity approximation. Since the oc-circuit captures the three key features of organized complexity, descriptive, computational, and distributional features, such sophisticated compression methods should utilize these three key features. As emphasized by [39], selecting good models and frameworks as an art that captures the object's key features should be important to seek effective approximation methods.

Applications to Network Expressions.
We now suppose that objects are expressed by network forms, which have been widely used for the natural complexity [9,11,27,33,34,36,37]. Note that the proposed complexity definition can treat (a deterministic form of ) network expressions as a particular case of distributions (see pointer (1) in Section 5.5). is section will show how our complexity definition can be applied to network expressions.
If a network (object) is less structured/featured, it should have more compressed (shorter) oc-circuit to produce the network, that is, it has (approximate) lower complexity. Conversely, it should have less compressed oc-circuit if it is more structured/featured, that is, it has (approximate) higher complexity.
For example, we suppose an undirected graph expresses a network with n nodes, represented by an adjacent matrix (n × n upper-triangular binary matrix) and expressed by an n(n − 1)/2 bit string. en, if a network (object) is randomly generated (i.e., almost the least structured/featured), the object should be the uniform distribution of possible networks (not a sampled network), that is, the uniform distribution of n(n − 1)/2 bit strings. en, the oc-circuit shown in pointer (3) of Section 5.5 (over 0, 1 { } n(n− 1)/2 ) produces the uniform distribution of the graphs (networks) and should be nearly the shortest one to produce it. e approximate complexity of this object (the randomly generated network) is the size of the oc-circuit. If a network is very regular (e.g., complete graph; i.e., nearly the least structured/featured), the oc-circuit shown in pointer (2) of Section 5.5 produces the complete graph and should be nearly the shortest one to produce it. e approximate complexity of this object (the complete graph network) is the size of the oc-circuit.
If a network is general, its properties/features such as network structure and topology, which have been extensively studied in the framework and analysis of the natural complexity [9,11,27,33,34,36,37], should be beneficial to obtain a compressed (short) oc-circuit to produce the network. e size of such an oc-circuit is the approximate complexity. For example, [33,35] presented frameworks to characterize and analyze social networks and highlight many technical disciplines. To evaluate the (approximate) proposed complexity of an object (social network), we have to find a compressed representation of the object or a short oc-circuit to produce the network, where the characterization and analysis suggested by [33,35] could be employed.
A network is usually considered deterministic in the natural complexity [9]. However, as shown above, in the random network case, it may be appropriate to consider a network expression as a distribution of networks since some parts of a network are often randomly distributed (e.g., some parts of genome patterns described in Section 2). As shown in Section 3, it may also be helpful to capture the computational feature of networks.

Remark on the Identification of Objects.
is paper focuses on the complexity definition for a given object, and it is outside the scope of this paper how to find or identify the object (Section 2). However, if we analyze or compute the complexity of an object in practical applications, we may have to identify the object before computing the (approximate) complexity of the object. When the object is a distribution, the model selection theory such as MDL [25,26] and AIC [24] would be used to identify the distribution from sampled data. For a variety of complex systems, we may need a method more specific to each system. For example, when the object is a complex system like biochemical systems, a graphical approach with the observability theory has been explored to identify the system [37].

Summary and Challenge.
Here, we summarize several issues about the applications of the proposed definitions.
(1) eoretical Applications As described above, possible theoretical applications of the proposed complexity definition are: (1) theoretical background for machine learning and artificial intelligence; (2) lower bound and average-case analysis of algorithms, networks, and computational and communication complexity regarding organized matters (living things, ecosystems, economic and social systems, artificial things, etc.); and (3)semantic information theory.
(2) Approximation Methods For practical applications, we need an efficient approximation method of the proposed complexity definition. As mentioned above, such an approximation method is closely related to a compressed (short) oc-circuit to simulate the object.
In the computational complexity theory, approximation algorithms for optimization problems have been extensively studied for many years [52]. Here, an approximation ratio, the multiplicative factor on how close the approximate solution is to the optimization solution, is a key factor to evaluate an approximation algorithm.
erefore, such an approximation ratio (of the approximate complexity value over the precise complexity value) is a key factor for an approximation method of the proposed complexity definition. Since the proposed definition generally requires at least an exponential-time computation, a natural question is which approximation ratio is achieved if an approximation method is a polynomial-time algorithm. Similarly, how is the approximation ratio for a sub-exponential-time or exponential-time algorithm? We need both theoretical and practical analyses for the problem.
(3) Applications to Network Expressions e proposed complexity definition can be applied to any form of network expression. Here, the framework and analysis studied in the natural complexity [9,11,27,33,34,36,37] should be beneficial for our complexity definition approach to network expressions.

(4) Evaluation Process
An advantage of our complexity definition for practical applications is that we can clarify the evaluation process, especially separating the phase for identifying the object, the phase for finding a compressed (short) oc-circuit to simulate the object, and the phase for obtaining the (approximate) complexity. e evaluation process of our (approximate) complexity is as follows: (a) Identifying an object We can skip this phase if an object is given in a proper form. Otherwise, we must identify the object or describe it precisely in some model (see Section 6.3). Here, note that an (identified) object can be any form of expression, for example, network expressions or others, deterministic or probabilistic, and so on. (b) Finding a short oc-circuit Given an object, this phase finds a short occircuit to simulate the object as shorter as possible (see Sections 6.1 and 6.2). (c) Obtaining the (approximate) complexity e size of the oc-circuit obtained in the previous phase is the (approximate) value of the proposed complexity definition of the object. If it is an approximate value, the approximation ratio (pointer (2) in this section) may be evaluated heuristically or theoretically.

(5) Coding eorem
Let X be a distribution over n bit strings, δ be a precision level (0 ≤ δ < 1), C be an oc-circuit, and OC(X, δ) be the proposed organized complexity of X at δ (Section 5.2). Let m(X, δ) ≔

Complexity
Note that the binary representation of C is inherently prefix-free (i.e., the Kraft inequality [3] holds, m(X, δ) ≤ 1.) We then conjecture the following inequality, which corresponds to the coding theorem of the Kolmogorov complexity.
OC(X, δ) + log 2 m(X, δ) < O (1). (23) Note that the difference with the coding theorem of the Kolmogorov complexity K(x) (Section 4.1 (1)) is as follows: (1) OC(X, δ) is computable, while K(x) is incomputable; (2) OC(X, δ) includes the computational complexity for C to produce (simulate) the object X, while K(x) does not include the computational complexity for a prefix universal Turing machine U to produce the object x; (3) the object X of OC(X, δ) is a distribution, while the object x of K(x) is a deterministic string; and (4) an oc-circuit C with X δ ≈ Y⟵ R C is used to define m(X, δ) and OC(X, δ), while a program p with x � U(p) is used for m(x) and K(x).

(6) A Problem in the Economy
An attractive but challenging application of our definition is to give the complexity of a country's economy in a single number [12]. If we can express a country's economy with an appropriate (probabilistic or deterministic) description of the economic activities, it would be possible to evaluate our approximate complexity definition of the description, which could be a single number for the organized complexity of the country's economy.

Conclusion
is paper quantitatively defined the organized complexity. e proposed definition for the first time simultaneously captured the three key features of organized complexity: descriptive, computational, and distributional features. It is computable, rigorously specified, and treated both probabilistic and deterministic forms of objects in a unified manner or seamlessly. As a result, it satisfied all of the criteria for organized complexity definitions introduced in this paper.
Organized complexity is an interdisciplinary concept straddling physics, cosmology, chemistry, biology, ecology, sociology, economy, informatics, and so forth. us, the proposed organized complexity definition would be a core notion in such interdisciplinary areas and offer some basis for tackling the problems posed in [2,20] and Section 6.

Data Availability
No data sets were generated or analyzed during the study.

Conflicts of Interest
e author declares that there are no conflicts of interest.