Evidence for mixed rationalities in preference formation

Understanding the mechanisms underlying the formation of cultural traits, such as preferences, opinions and beliefs is an open challenge. Trait formation is intimately connected to cultural dynamics, which has been the focus of a variety of quantitative models. Recently, some studies have emphasized the importance of connecting those models to snapshots of cultural dynamics that are empirically accessible. By analyzing data obtained from different sources, it has been shown that culture has properties that are universally present, and that empirical cultural states differ systematically from randomized counterparts. Hence, a question about the mechanism responsible for the observed patterns naturally arises. This study proposes a stochastic structural model for generating cultural states that retain those robust, empirical properties. One ingredient of the model, already used in previous work, assumes that every individual's set of traits is partly dictated by one of several, universal"rationalities", informally postulated by several social science theories. The second, new ingredient taken from the same theories assumes that, apart from a dominant rationality, each individual also has a certain exposure to the other rationalities. It is shown that both ingredients are required for reproducing the empirical regularities. This key result suggests that the effects of cultural dynamics in the real world can be described as an interplay of multiple, mixing rationalities, and thus provides indirect evidence for the class of social science theories postulating such mixing. The model should be seen as a static, effective description of culture, while a dynamical, more fundamental description is left for future research.


I. INTRODUCTION
A solid theoretical understanding of how preferences form is currently lacking. There is little doubt that preferences, opinions, values and beliefs, which are generically referred to as "cultural traits", are dynamical entities, and that interpersonal social influence plays an important role in driving their dynamics, among other factors. Moreover, a complete theoretical understanding should account for the fact that dynamics of traits takes place in parallel along multiple dimensions, namely that opinions and preferences can develop in relation to multiple topics or aspects of life. Along these lines, various dynamical models been developed and studied [1], such as the Axelrod model [2], which is very representative for studies of multidimensional dynamics, commonly referred to as "cultural dynamics", in contrast to studies of unidimensional dynamics, commonly referred to as "opinion dynamics". Recent studies [3][4][5] have shown that models of cultural dynamics are sensitive to the initial conditions, namely to how the initial vectors of agents' traits are chosen: initial cultural states constructed from empirical data show systematic deviations from their shuffled and random counterparts. In fact, Ref. [5] argues that such deviations point towards universal structural properties inherent in empirical cultural states. More insights about the formation of cultural traits should be achievable by studying these states, since they can be regarded as partial snapshots of cultural dynamics in the real world.
The universal properties mentioned above are expressed in terms of the effects the empirical cultural state has on social influence models using it for their initial conditions -here, a "cultural state" is a set of cultural vectors (SCV), where each cultural vector encodes the sequence of cultural traits associated to one agent in the model. On one hand, one uses an Axelrodtype model [2] of (multi-dimensional) cultural dynamics to evaluate a measure of cultural state propensity to long-term cultural diversity (LTCD). On the other hand, one uses a Count-Bouchaud-type model [6] of (onedimensional) opinion dynamics to evaluate a measure of cultural state propensity to short-term collective behavior (STCB). Both measures are functions of a common parameter, controlling for the range of social influence in cultural space, which allows for an LTCD-STCB correspondence to be drawn for a given cultural state. It turns out that an empirical cultural state generally induces an LTCD-STCB curve that is close to the second diagonal (LTCD(ω) ≈ 1 − STCB(ω), ∀ω), while exhibiting, for a given STCB value, higher LTCD values than a trait-shuffled cultural state, which in turn exhibits higher LTCD values than a randomly generated counterpart [3,5]. These results are universal [5], namely independent of the data set used for constructing the cultural vectors composing the empirical cultural state, suggesting that real-world cultural dynamics is governed by universal laws. Moreover, as argued in Ref. [5], this type of analysis suggests that inter-agent social influence, the essential ingredient of cultural dynamics models, is insufficient for explaining the observed structure. Although it is meaningful to think of incorporating additional ingredients into social influence models, in an attempt to give rise to empirical-like structure in a dynamical setting, this is not what the current study aims at. Instead, it aims at providing an effective, phenomenological, static description of the observed structure, which should provide additional insights before a more fundamental, dynamical description is to be attempted.
The purpose of this study is to develop a structural stochastic model that would generate realistic cultural states, while incorporating plausible ingredients from social science. Specifically, these states should retain the universal properties inherent to empirical cultural states that are observed in Ref. [5]. In fact, Ref. [4] has already investigated various ways of generating sets of cultural vectors in random, but non-uniform ways. A method that appeared particularly promising relied on the notion of "cultural prototypes": a few underlying, abstract sequences of logically compatible, self-enforcing cultural traits, which govern the way the generated vectors are distributed in cultural space. According to the method, each cultural vector is partly a copy of one of the prototypes and partly random. The implicit claim is that each cultural prototype is induced by one of a few (3 to 5) fundamental and universal "principles of social life", or "rationalities", that would strongly affect any process of trait formation in any social system. Such entities are postulated, under different names and in slightly different numbers, by several theoretical frameworks in social science [7][8][9][10][11]. All these frameworks are appealing from a natural science perspective, since they exhibit a certain reductionist tendency of trying to understand the observed socio-cultural variability in terms of combinations of a few, elementary and universal building blocks. Moreover, some of them are explicit attempts to unify social science. Various parallels and similarities between these theories are discussed in the literature [12][13][14]. For the purpose of the current study, all these theories are equivalent. Still, for creating an instructive and compact context, the discussion is restricted to one of them, namely to Plural Rationality Theory, chosen for reasons discussed in Sec. V.
Plural Rationality Theory (PRT), also referred to as "(Grid-Group) Cultural Theory" [7], is a qualitative description of socio-cultural structure and dynamics as an interplay between a small number of irreducible "ways of life", or "rationalities". These ways of life are understood as abstract, "elementary building blocks" of societies and are supposed to be recognizable regardless of the geographical context, of the historical context or of the scale of the system that one looks at. It is claimed that the ways of life always coexist, although either of them is often dominant for a given period of time, for a given (part of) the system that one studies 1 . Moreover, each way of life is understood as a self-enforcing combination of a "pattern of (social) relations" and a "cultural bias". On one hand, a pattern of relations is often understood as a 1 It may be useful to think of the ways of life as being the elements of a complete, orthogonal basis of some abstract vector space.
One may then associate a vector in this space to a certain part of a certain socio-cultural system, at a given moment in time. It is not clear to what extent such vectors would be related to the cultural vectors used in this study. This is only a semi-formal analogy that is not exploited further here, nor in any other study so far, to the extent that the authors are aware of.
tendency of organizing the social ties between people in a certain way, thus a connectivity pattern in the social graph. On the other hand, a cultural bias is a combination of preferences, opinions, values and beliefs that are compatible with each other and with the associated pattern of relations. By comparison to the definitions in Ref. [5], one can easily realize that a cultural bias can be thought of as a point or a region in "cultural space" that is representative for the respective "way of life". This is essentially what the notion of "cultural prototype" used here (and in Ref. [4]) stands for. Interestingly, PRT distinguishes between a "social plane" and a "cultural plane" of interacting human systems, while acknowledging the dynamical nature of both and the strong coupling and interdependency between the two. Thus, PRT seems to resonate well, on one hand to research on social network structure and dynamics, on the other hand to research on cultural structure and dynamics. Up to now, little work has been done to explore each of these two connections. While Ref. [4] and the present work are the first steps in exploring the latter connection, some steps have also been taken in exploring the former connection [15,16]. Note, however, that Ref. [4] refers to several theories similar to PRT, without explicitly mentioning PRT, that Ref. [16] focuses on a social theory similar to PRT, while still discussing a connection with PRT and that Ref. [15] works with an earlier, more rudimentary version of PRT, which gave less importance to the notions of "way of life", "rationality" and "cultural bias". Although the coupling between social dynamics and cultural dynamics is recognized and studied by quantitative complex systems research (for instance Refs. [17,18]), this has been carried out in isolation from PRT. In the language of PRT, one cultural prototype corresponds to one cultural bias, which, in loose terms, is in turn the cultural plane projection of one way of life. For the purpose of this study, in agreement with Ref. [4], a cultural prototype is a combination of cultural traits, thus one point in cultural space -as discussed in Sec. V, for certain purposes, this definition should be extended in future work. Relying on this notion, two stochastic, structural models of culture are developed and studied here. The first stochastic model, called "Prototype Generation" (PG), generates every cultural vector by copying a random sequence of traits from one of the k prototypes, while generating the other traits in a uniformly random way. This effectively specifies that there are k "classes" of cultural vectors and those of a certain class are located at a certain average distance from the respective cultural prototype, quantity which can be controlled for. The points in cultural space associated to the prototypes are also randomly chosen, in a manner that allows for the expected distance between the prototypes to also be controlled for. The Prototype Generation method used here is similar to the "Prototype Evolution" method of Ref. [4]. There are small differences between how exactly the vectors are generated in the two cases. Moreover, the method of Ref. [4] did not allow for controlling the expected cultural distance between the prototypes. It is shown that cultural states generated by PG are structurally dissimilar to the empirical ones, as they do not exhibit the universal LTCD-STCB behavior, after tuning the free parameters to empirical data in terms of simpler, but meaningful quantities. Therefore, this work proposes a second, arguably more realistic model.
The second, more stochastic model is called "Mixed Prototype Generation" (MPG). As an additional ingredient with respect to PG, this model assumes that every generated cultural vector is a mixture (or combination) of all the prototypes, while still maintaining that one of the prototypes is typically dominating a certain cultural vector. From the perspective of PRT, this is a formal realization of the idea that every person combines the ways of life in a unique way, such that preferences and opinions related to different aspects of life -cultural traits of different cultural features (or variables) -are due to the "influence" of different cultural biases, though at any given moment in time one cultural bias is usually dominating. In the literature concerned with PRT and the other, similar, theories, this mixing aspect often goes under the name of "the multiple self", and was not implemented in Ref. [4]. One consequence of this ingredient is limiting the usage of explicitly random trait generation, implying that cultural vectors are more strongly constrained by the prototypes, while still allowing for a large variety of possible cultural vectors. It turns out that cultural states generated with MPG are structurally similar to the empirical ones, as they reproduce the universal LTCD-STCB behavior, after tuning the free parameters in a way that is analogous to the tuning of PG.
The rest of this manuscript is organized as follows. Sec. II provides the technical description of the two models. Sec. III provides a description of the tuning of the free model parameters, which is done as to reproduce some lower-order properties of one of the empirical cultural states. Based on this tuning, Sec. IV shows that MPG is clearly better than PG at reproducing the empirical LTCD-STCB curves, thus pointing out the importance of the "mixing" ingredient. Sec. V further discusses certain ideas and results presented throughout the study, as well as questions that are worth investigating in the future. The manuscript is concluded in Sec. VI.

II. MODEL DESCRIPTION
This section describes the two stochastic models of culture: the Prototype Generation (PG) model and the Mixed Prototype Generation (MPG) model, which are to be used to generate sets of cultural vectors (SCVs) that can be quantitatively studied with the LTCD-STCB tool, previously applied to empirical SCVs in Refs. [3,5]. Both models rely on the concept of cultural prototype introduced above. For the purpose of this work, the general set-up is restricted to cultural spaces defined in terms of features that are exclusively nominal. Disregarding ordinal features makes the modeling paradigm compatible with the (arguably strong) assumption that one prototype corresponds to one point in cultural space, meaning that a prototype picks up one and only one trait of any given feature. Other limitations of this assumptions are extensively discussed in Sec. V, together with possible ways of relaxing it, for the purpose of generalizing the current modeling paradigm in future work.
The two models are schematically illustrated in Fig. 1. The figure first shows a sketch of an empirical SCV, where the rows correspond to cultural features, the columns correspond to cultural vectors and the letters correspond to cultural traits -the n'th row shows the traits of the N agents that are expressed (or formulated) with respect to the n'th feature. Then, it shows a set of 3 cultural prototypes (their number could have been different), in 3 different colors, all of them spanning over all features (or questions) relevant for the empirical set of vectors. Finally, it illustrates a typical set of vectors generated using the PG method, followed by one generated using the MPG method. The colors distinguish between the prototypes and indicate how the traits are copied from the prototypes to the cultural vectors, while black denotes traits that generated in an explicitly random way (uniform distribution, independently of the prototypes).
There are several things worth noting in relation to Fig. 1. First, the possibility that two or more prototypes pick the same trait for a certain feature is allowed by the current modeling paradigm, which is essential for controlling the average prototype-prototype distance, as will become apparent below (note that any of the traits that can be copied from one of the prototypes can also be generated via explicit randomness). Second, a PG vector is partly copied from one prototype and partly generated in an explicitly random way, while a MPG vector is a mixture of copies from all the prototypes, with one dominating prototype and with few traits generated in an explicitly random way. Third, both models make use of another type of randomness, in addition to the explicitly random trait generation and to the randomness involved in generating the prototypes. This randomness enters in assigning every trait of every vector to a "prototype of origin" once the random generation fraction and the influence fractions of the prototypes are specified. In the case of MPG, it is mainly this trait-assignment randomness that allows for the generation of a multitude of distinct cultural vectors from a small set of fixed prototypes, in the presence of little explicitly random trait generation.
The procedure for generating the cultural prototypes is the same for both the PG and the MPG models. One needs to specify the number of prototypes k, as well as the value of another parameter α ∈ (0, 1), which controls for the expected cultural distance between the prototypes. This parameter governs the expected number of overlaps (or coincidences) between prototypes in terms of how they are distributed over the traits of a specific x1 x2 x3 xN x4 x5 FIG. 1. Schematic illustration of the two stochastic models, showing (from left to right): an empirical SCV with N vectors (x1 to xN ) and F nominal variables (Q1 to QF ); a set of k = 3 cultural prototypes for the same F variables; a SCV with N vectors generated, from the prototypes, using the PG model; a SCV with N vectors generated, from the same prototypes, using the MPG model. For the PG and MPG sketches, red, green and blue denote copies of cultural traits from one of the first, second and third prototype respectively, while black denotes explicitly random generation of traits.
feature. In the extreme case of α → 1, all prototypes pick the same trait for every feature, yielding the smallest possible separation between the prototypes in cultural space (which coincides with the minimum of 0 allowed by the cultural distance definition in Eq. (5). In the other extreme case of α → 0, the prototypes are distributed as uniformly as possible over the traits of every feature, yielding the largest possible separation between the prototypes in cultural space (which only coincides with the maximum of 1 allowed Eq. (5) if the number of traits q is larger or equal to the number of prototypes k for every feature). This is achieved by a formulation in terms of the set of integer partitions I q k describing the possible ways how the k prototypes can be distributed over the q traits of a certain feature. The α parameter actually controls the probability distribution over the set I q k , via the "compactness" of the integer partitions in this set. Sec. A 2 precisely describes how these probabilities are assigned and how the set I q k is computationally generated in the first place, for any combination of k and q. Once the prototypes are chosen, everything else is conditional on them, for both models.
According to the Prototype Generation (PG) model, each cultural vector is a partial realization of one of the prototypes. Thus, for each of the N cultural vectors that need to be generated, one of the k prototypes is randomly chosen. Then, a subset of the F features of length round(β · F ) is randomly and independently selected for each vector and the traits of these features are copied from the prototype to the vector. Here, "round" returns the integer that is closest to its argument, while β ∈ [0, 1] is a third model parameter, in addition to k and α (which are already needed for the purpose of specifying the prototypes, in the manner described above). The β parameter specifies the fraction of traits that are directly copied from the prototype, thus controlling for the expected distance between a vector and its prototype. The traits for the remaining features are generated randomly and independently, according to uniform feature-level probability distributions -the explicit random generation mentioned above. Thus, β also controls for the amount of explicitly random generation of traits.
According to the Mixed Prototype Generation (MPG) model, each cultural vector is a combination of all prototypes, although an unbalanced combination, meaning that the numbers of traits copied from the different prototypes are deliberately unequal. The extent of this discrepancy is explicitly controlled by the third model parameter, which, like for PG, is called β. Although the exact definition and usage of the β ∈ (0, 1) parameter is different in MPG than in PG, its role is quite similar. Specifically, also in the context of MPG, β (indirectly) controls for the fraction of traits copied from the dominating prototype to the vector: more traits are copied from the dominating prototype if the discrepancy between the prototypes is higher. In addition to traits copied from the prototypes, there are traits that are generated in an explicitly random way, but in a small number. For each generated vector, this number is by construction not higher than the number of traits copied from the lowest-contributing prototype. Consequently, if there are k prototypes, the number of traits generated via explicit randomness does not exceed F/(k +1). Thus, 1/(k + 1) is an upper bound for the fraction of explicit randomness in an entire set of cultural vectors generated with MPG. It is also important to note that, like for PG, this fraction is controlled by β and that the upper bound is reached when β is in the limit of minimal imbalance.
The MPG model needs a procedure of specifying, for each generated vector, the k values of the numbers of traits that are to be copied from the k prototypes, along with the number associated to explicitly random generation. These k + 1 positive, integer numbers should add up to F and have their discrepancy controlled by the β parameter. Moreover, there is no reason to believe that the sequence of numbers associated to one β value should be the same across all generated vectors, so randomness should be involved in choosing these numbers. Therefore, the model needs a probabilistic way of drawing k + 1 random, positive integers {t 1 (β), .., t k+1 (β)} satisfying k+1 l=1 t l (β) = F , such that their expected discrepancy is controlled via a single parameter β. The procedure chosen for this purpose is described below.
This procedure heavily relies on isometrically map-ping the discrete {0, 1, .., F } set of integers to the [0, 1] interval of the real axis. For each generated vector, the latter interval is split into k + 1 parts, by performing "cuts" in k randomly chosen points. In this manner, a sequence of k + 1 preliminary weights {W 1 , ..., W k+1 } subject to k+1 l=1 W l = 1, is numerically obtained. These weights are obviously independent of β and have a fixed expected discrepancy. A β-dependent transformation (explained below) is applied on the preliminary weights {W 1 , ..., W k+1 }, thus providing a sequence of β-dependent weights {w 1 (β), ..., w k+1 (β)} satisfying k+1 l=1 w l (β) = 1, with expected discrepancy controlled by β. Finally, the sequence of β-dependent weights is converted back to the desired sequence {t 1 (β), .., t k+1 (β)}. This final operation is non-trivial, requiring a self-consistent, joint rounding procedure, which is generally difficult to choose, since one cannot generally ensure that w l = round(t l /F ), ∀l -a non-trivial problem of weight discretization. Here, a simple, pragmatic choice is made: converting the lowest k weights to the closest, lower integer, while converting the highest weight to the integer needed for satisfying the summation constraintthis ensures that the highest weight, which should correspond to the dominating prototype, is converted to the highest integer.
Before describing the fitting and the outcomes of the PG and MPG models, it is worth summarizing a few important aspects. Both models rely on the notion of cultural prototypes, which is currently formalized in a simplistic manner, which is only sensible for cultural spaces defined exclusively in terms of nominal features. The procedure for generating the prototypes is the same for both models and relies on two parameters, k and α, which specify, respectively, the number of prototypes and the expected distance between them. The differences between PG and MPG consist in how the cultural vectors are generated conditionally on the prototypes: for PG, every vector is in part a copy from one of the prototypes and in part explicitly random; for MPG, every vector is an imbalanced mixture of all prototypes and explicitly random to a much lower extent, which is how the "multiple-self" ingredient is implemented. Nonetheless, in both cases, there is a third model parameter, β, which governs, in different ways, the lengths of the randomly selected subsets of features whose traits that are copied from the prototypes. In both cases, β effectively controls for the expected distance between a vector and its (dominating) prototype, as well as for the fraction of explicit randomness.

III. MODEL FITTING
Before applying the LTCD-STCB analysis on SCVs generated with either the PG or MPG models, it is useful to somehow constrain some of the free model parameters. This is done in terms of statistical quantities simpler than the LTCD and the STCB measures, that can be evaluated on both empirical SCVs and on the model SCVs.
On the empirical side, the quantities are averaged over several, empirical SCVs constructed by randomly selecting N = 500 cultural vectors from the 13000 available ones in Eurobarometer data set [19], while restricting to the nominal features -let "(EBM n )" stand for the nominal part of the Eurobarometer data set. On the model side, these quantities are averaged over many SCVs, of the same size N , that are realizable in the cultural space of (EBM n ), for the given combination of parametersthe prototypes are independently generated upon creating every model SCV.
The two simple quantities in terms of which the models are to tuned to empirical data are the average and the standard deviation of the inter-vector distances in the SCV, which are here denoted by "AIVD" and "SIVD" respectively: where N is the number of cultural vectors and d ij is the cultural distance, as defined and used in Refs. [3][4][5]. The notation i < j denotes that the respective summation is carried out over all distinct pairs (i, j). In the case of a fully-nominal cultural space, with which this study is dealing, d ij reduces to the Hamming distance between the two sequences of symbols encoding cultural vectors i and j: where l iterates over the F nominal features, x l i , x l j are the traits of vectors i and j with respect to feature l and δ stands for the Kroneker-Delta function. Obviously, d ij takes values within the [0, 1] interval. The second equality shows that the cultural distance can be expressed as an average over feature-level contributions, which becomes useful below. Previous work has shown that an empirical SCV is characterized by a lower AIVD than its random counterpart and a higher SIVD than both its random and shuffled counterparts [3,4].
It is instructive to see that the expressions of AIVD and SIVD can be rewritten in the following way: by using a feature-level cultural distance d l ij introduced via Eq. (5) -the transition from (4) to (7) was suggested by the SI of Ref [3].
Note that the AIVD can be understood as an average over feature-level AIVD contributions, which are represented by the expression within the k-summation of Eq. (6). It can be checked that, in the case of a nominal feature, the AIVD contribution is a measure of how uniformly the N vectors are distributed over the possible traits of that feature. This is more obvious when expressing the expected value of the AIVD contribution in terms of probabilities associated to the traits, which is shown in Eq. (8) below. Thus, for an empirical SCV containing only nominal features, the AIVD is a measure of average uniformity of the empirical frequency distributions associated to the features. Consequently, the AIVD is also a measure of how subjective the questions/topics associated to the features are on average -when the frequencies of possible answers are more similar to each other, there is less justification to talk about a "better", a "more correct" or a "more agreed upon" answer, so the question is inherently more subjective.
Also note that, in Eq. (7), the quantity inside the average over pairs of features (k, l) is the covariance between features k and l, defined in terms of the feature-level cultural distances. Given that this quantity is averaged over all possible pairs of features and that the squareroot is a monotonous function, the SIVD encodes information about the pairwise correlations between features, although in a somewhat indirect way.
For both models, the choice made here is that of: • tuning the α parameter in terms of the AIVD quantity (Eqs. (3), (6)), for any combination of values of the β and k parameters; • tuning the β parameter in terms of the SIVD quantity (Eqs. (4), (7)), for any value of the k parameter, based on the previous fitting of α in terms of AIVD; • simply repeating the tuning and the LTCD-STCB analysis for several values of k.
This implies that, for every value of k, the tuning (or fitting) is done at two levels: the α-AIVD level and the β-SIVD level, the former being nested into the latter.
In practice, the fitting is carried out automatically, using a nested, 2-levels algorithm that relies on a modified bisection-type method for each level. The algorithm is precisely described in Sec. C. In order to work, this approach relies on the assumption that there is one, unique solution for the fitting problem, for every value of k. This uniqueness is demonstrated via Figs. 2 and 3 below, which are also used for providing a general intuition of how the fitting works and of how the AIVD and SIVD quantities depend on α, β and k, for the two models. Before entering this description, it is worth mentioning that the computer time for the fitting algorithm is greatly reduced by being able to evaluate the average AIVD quantity analytically, in a manner that properly accounts for all SCVs that can be generated for any combination of k, α and β. While the calculation is described in detail in Sec. B, a schematic understanding can already be provided here. The essential ingredient of the calculation is a simple, exact formula for the expected AIVD contribution of one feature of range q: which assumes that the probabilities of its traits {p 1 , ..., p q } are all known -see Sec. B for the proof. For a discrete probability distribution, eq. (8) is a measure of uniformity very similar to the Shannon entropy. Conditional on a specific choice of the prototypes, this set of probabilities (thus the feature-level probability distribution) is fully determined by the integer partition describing how the prototypes are distributed over the traits and by the fraction of traits that are randomly generated, the latter being controlled by β. In this context, Eq. (8) already assumes that an averaging is performed over SCVs generated from the same set of prototypes. One still needs to perform an average of this expression over integer partitions (Eq. (B2) of Appendix Sec. A 1), according to the probability distribution controlled by α   els. Moreover, it shows the empirical AIVD uncertainty range 2 via the horizontal bands in the six panels. Thus, a solution of the first-level fitting is indicated by an intersection between a model curve of a given combination of k and β and the horizontal band. Note that, for either of the two models and for any combination of k and β, if a solution exists, this solution is actually unique. In order to understand the behavior implicit in Fig. 2, which is explained below, one should keep in mind that AIVD measures the average uniformity of the featurelevel probability distributions.
First, it is worth focusing on the AIVD dependence on the α and β parameters. Note, on one hand, that for a given combination of k and β, the AIVD generally decreases with α, or at least remains constant. This is due to the fact that the AIVD decreases with decreasing distance between prototypes, thus with increasing α. For PG, this decrease is stronger for higher β values, since for low β value the uniformity is anyway high, because of the large fraction of randomly generated traits. For MPG, this β-dependence of the decrease is not that strong, since the fraction of randomly generated traits cannot exceed 1/(k + 1). On the other hand, for a given combination of k and α, the AIVD generally decreases with increasing β. This is due to the fact that the AIVD decreases with decreasing fraction of randomly generated traits, thus with increasing β.
Second, it is worth focusing on the AIVD dependence on the number of prototypes k. For PG, for a given α, a larger number of prototypes k implies a higher AIVD, since traits copied from prototypes are more uniformly distributed, but this has a significant effect only for large β values, again due to the uniformity being anyway in place at for small β values. For MPG, the corresponding behavior is more subtle. While for large, β → 1 values, the AIVD still increases with increasing k at a given α (for the same reason as for PG), the AIVD(α) curves corresponding to small β approach the AIVD(α) curve corresponding to large β → 1 with increasing k, rather than remaining in place (which is the case for PG). This is related to the fact that the upper bound on the fraction of randomly generated traits 1/(k + 1) decreases with increasing k, thus decreasing the role of β in controlling the AIVD via the uniform component of the feature-level probability distributions. Fig. 3 deals with the second-level fitting. Everything that this figures shows relies on α already being tuned (at the first level) such that the empirical AIVD is matchedas apparent from Fig. 2, the tuned α value depends on β and on k. Fig. 3 shows the dependence of the numerically computed SIVD quantity (with uncertainty ranges) on the β parameter, for several k values and for both the PG and MPG models. Moreover, it shows the empirical SIVD uncertainty range via the horizontal bands in the two panels. Thus, a solution of the second-level fitting is indicated by an intersection between a model curve of a given k and the horizontal band. Note, again, that for either of the models and either of the k values, if a solution exists, this solution is actually unique. The exact technical procedure employed for producing any of the model points in Fig. 3 is described in Sec. C 4 of the Appendix. Sec. C 4 also describes how the final choice of values for the α and β parameters is made.
Note that the SIVD increases with β for both models and for all k values, conveying that the extent of featurefeature correlation increases with decreasing distance between vectors dominated by the same prototype. For PG, all SIVD(β) curves meet for some β ≈ 0.45, at which point they also end. No points are plotted for lower β because α cannot be tuned in terms of AIVD, which can be understood from Fig. 2 when noticing the AIVD(α) curves of low β that do not cross the empirical line. For MPG, the SIVD(β) curve of k = 2 ends at a value of β ≈ 0.5, before crossing the empirical line, meaning that the MPG model cannot be entirely tuned when only 2 prototypes are used. No points are plotted for higher β because α cannot be tuned in terms of AIVD, which can be understood from Fig. 2 when noticing the AIVD(α) curves of k = 2 and high β that do not cross the empirical line. This is due to certain limitations of the current modeling paradigm, which are further discussed in Sec. V.

IV. MODEL OUTCOMES
Here, the most important results of this work are presented. The focus is on the LTCD-STCB analysis, applied to sets of cultural vectors generated with the PG and MPG models. The aim is to assess how well the two models reproduce the universal empirical patterns described in Ref. [5]. Fig. 4 illustrates the results obtained with the two models, whereas Fig. 5 summarizes, for comparison purposes, the empirical results, focusing on the nominal part of the Eurobarometer dataset (EBM n ).
Before describing the results, it is worth recalling the main ingredients of the LTCD-STCB analysis. This is essentially a two-dimensional plot showing the correspondence between the LTCD quantity vs the STCB quantity, both of them being evaluated on empirical, on shuffled and on random SCVs. Drawing the LTCD-STCB correspondence is made possible by the fact that, for each of the three scenarios, both quantities depend on the bounded-confidence threshold ω, which controls the maximal cultural distance over which social influence can act. On one hand, the LTCD quantity is a measure of cultural diversity after a long-term process of cultural dynamics driven by ω-bounded social influence, starting from an initial cultural state specified by the respective type of data. Essentially, it counts the number of distinct points in cultural space (commonly referred to as "cultural domains") towards which the agents converge in the final state of a minimalisitic, bounded-confidence Axelrod model. The STCB quantity is a measure of collective behavior (or social coordination) after a shortterm process of opinion dynamics driven by ω-bounded social influence. Essentially, it is the standard deviation of the aggregate opinion distribution of the agent population, resulting from a minimalistic Cont-Bouchaud-type model applied to the (cultural) graph obtained by drawing a link for each pair of agents separated by a cultural distance smaller than ω. Mathematically, the two quantities, as functions of the bounded-confidence threshold ω, are captured by the following two expressions: where N D is the number of cultural domains in the final state of the Axelrod-type model, N is the number of agents (and cultural vectors) and S A is the size of the A'th of connected components in the ω-determined cultural graph. The average in the LTCD formula is taken over multiple simulations of the Axelrod-type model. The STCB quantity is calculated analytically, once the cultural connected components are found, based on the assumption of opinion-agreement within each connected component and independence between connected components. An essential difference between the two quantities, reflected in the long-term/short-term distinction, consists of an idealized separation between two timescales, in terms of the role that the SCV specified as input plays: cultural vectors, together with the distances between them, are assumed to be dynamical by the LTCD definition and static by the STCB definition, such that one deals with dynamics of vectors and with dynamics on vectors in the two cases respectively. The interested reader is referred to Refs. [3,5] for more details and remarks about the LTCD-STCB analysis. For both the PG and the MPG models, the α and β parameters are tuned in the manner described in Sec. III for every value of the number of prototypes k, while the latter is simply iterated over. In Fig. 4, the LTCD-STCB plot is shown for the values k = 3, k = 4 and k = 5, for the PG (left) and the MPG (right) models. The value k = 2 is omitted since the α and β parameters could not be both tuned for MPG with two prototypes. All SCVs are generated using the cultural space of EBM n , whose empirical SCVs also served for providing the AIVD and SIVD values in terms of which the tuning was conducted.
When looking at Fig. 4, one should ask whether the universal, empirical patterns are reproduced by any of the six illustrated model scenarios. Qualitatively, the patterns are defined first in terms of a higher compatibility between LTCD and STCB in the model-generated SCV than in the shuffled SCV and a higher compatibility in the shuffled SCV than in the random one, second in terms of the model-generated LTCD-STCB curve being close to the second diagonal. These empirical features are visible in Fig. 5. It is clear that PG does not satisfy these criteria for any value of k. Indeed, the model-generated curve is far below the second diagonal for most of the relevant interval and often below the shuffled curve. MPG, however, appears to satisfy all these criteria for all k values, though for k = 3 it is not obvious that the shuffled curve is indeed above the random one, due to the lack of points in the lower-left corner. This has to do with an effective discreteness of the bounded-confidence threshold ω spectrum, due to the finite number of nominal features available -in other words, since it is meaningless to split the ω axis into intervals that are smaller than For a direct comparison with analogous empirical curves, one should use Fig. 5, which shows the results of the LTCD-STCB analysis applied to EBM n data. However, it is only meaningful to compare the qualitative nature of the empirical and the model curves, rather than the exact values, since, as discussed in Sec. V, neither model has a maximum-likelihood nature, due to a certain simplicity in the way prototypes are formalized and chosen here. Still, MPG apparently does generate SCVs that are structurally similar to the empirical ones. Thus, the notion of cultural prototypes, even if implemented in a simplistic way, can be used to reproduce the important, universal properties of empirical cultural space distributions, as long as mixing of prototypes is in place.

V. DISCUSSION
The purpose of this study was to develop a way of generating cultural states that reproduce the universal properties of the empirical ones, namely those described by Ref. [5]. For this purpose, it was natural to use input from social science, in particular from social science theories that are intended to describe universal aspects of culture and society. There is an entire "class" of social science theories that appear relevant for this purpose, some originating in psychology and some in cultural anthropology [7][8][9][10][11]. All of them make use of cultural prototypes, although in somewhat different ways, under different names and numbers. Moreover, they had all been overlooked by previous studies of cultural dynamics, on which Ref. [5] largely builds: Ref. [4] was the first study that connected these theories, via the generic, formal notion of "cultural prototypes", to quantitative studies of cultural dynamics. For creating an instructive and compact context, this work focused on one of these theories, namely on Plural Rationality Theory (though this was not explicitly mentioned by Ref. [4]). There are several reasons why PRT was chosen for this purpose, in addition to its explicit social-vs-cultural distinction mentioned in Sec. I. First, its informal notion of cultural bias matches very well the more formal notion of cultural prototype, in the manner used in Ref. [4] and here. Second, it explicitly claims to provide some insight into how preferences form: preferences are formed in the process of building social relations, while different patterns of relations (and types of institutional settings) go along with different types of preferences. Third, it is more appealing from a natural science perspective than the others, in particular from a physics and complex systems perspective. This is largely due to various concepts that are qualitatively (and sometimes just implicitly) invoked by PRT, such as: energy landscapes, symmetry breaking, graph/network theory, dynamical systems, crossovers (possibly phase transitions), self-organization and fractals.
Based on the notion of cultural prototypes, two stochastic models have been studied here: Prototype Generation (PG) and Mixed Prototype Generation (MPG). It is important that, regardless of which model is used, once the prototypes and the remaining free parameters (parameter β in for either PG and MPG) are specified, one implicitly defines a cultural space distribution, according to which points are then selected at random. Thus, the resulting cultural states are still generated randomly, but in a highly non-uniform way, which depends on the prototypes and other model specifications.
For this study, the usage of both stochastic models is restricted to cultural spaces constructed only from sets of nominal features. This is due to the assumption that every prototype picks one and only one trait in any feature, which from a PRT perspective means that, upon answering a question under the influence of one cultural bias, a respondent can only provide one specific answer. In reality, even a specific cultural bias would generally point towards several answers, although with different probabilities, so it would be more realistic to say that every prototype corresponds to one probability distribution defined over that feature. Not allowing for this freedom makes this modeling paradigm incompatible to ordinal features, whose associated traits are by construction sorted along an axis, in which case it is not reasonable to assume that a prototype points to one trait of a feature with full probability and to its nearest-neighbors with zero probability. Nonetheless, the paradigm is reasonably compatible with nominal features, in which case the distance between any two traits of one feature is anyway assumed to be the same.
The current study belongs to a preliminary, simplistic paradigm which makes use of what one may call "sharp prototypes". A more realistic paradigm, which would account for the probabilistic nature of the cultural biases, would make use of what one may call "diffuse prototypes". Using sharp prototypes comes at the cost of not having enough flexibility to reproduce the empirical, feature-level frequency distributions, with either of the two models, since every prototype corresponds to a probability distribution entirely peaked on one trait. Instead, using diffuse prototypes would allow this by enforcing, for every feature, that the empirical distribution is a linear combination of the prototype distributions. Nonetheless, as shown in Sec. III, both models are still able to reproduce the empirical average uniformity of the featurelevel frequency distributions, namely the AIVD quantity. This is partly due to both models making some use of uniformly-random trait generation, independently of the prototypes. This translates to a flat noise component in the probability distribution of every feature, which in a sense compensates for the rigid peaks of the sharp prototypes. When also considering the results of Sec. IV, the usage of sharp prototypes restricted to nominal variables appears to be enough as a proof of concept. This justifies further research towards the more sophisticated paradigm relying on diffuse prototypes. Although this is left for future studies, it is worth contemplating upon, in order to better understand the purpose, greater context and limitations of the current paradigm.
Working with diffuse prototype should go hand in hand with a method of inferring them from data. One can imagine doing this by applying a sensible clustering method on the empirical set of cultural vectors, followed by a sensible method of constructing one diffuse cultural prototype from every cluster, as a probabilistic entity that is representative of that cluster. The main advantage of this approach is that once the prototypes are constructed and provided as input to a sensible stochastic model, the artificial SCVs generated with this model would be close-to-representative of the same distribution in cultural space as the empirical SCV on which the method is applied in the first place. This means that the model would have a maximum-likelihood flavor, and could be used for generating synthetic data, which would also reproduce the feature-level frequency distributions.
By contrast, the approximation of sharp prototypes used here is too strong to be employed together with a method of inferring them from data. Instead, sharp prototypes are being assigned to randomly chosen positions in the given cultural space. On one hand, the fact that the prototypes are randomly chosen makes any model symmetric up to any permutation of the traits of any feature, as long as all features are nominal, which is the case here, a symmetry which is broken by an empirical SCV and also by an artificial SCV generated from a specific choice of the prototypes. On the other hand, the fact the prototypes are sharp does not allow for the exact frequency distribution of a specific feature to be reproduced, not even up to a permutation of the traits. Still, after parameter tuning, one should expect from a good model to provide a cultural space distribution whose rough "shape" is compatible with the empirical data, though the "orientation" and the structural details implied, for instance, by the feature-level distributions would not be compatible. This should reflect in roughly reproducing the universal LTCD-STCB patterns emphasized in Ref. [5]: one one hand, the formulation of the LTCD and STCB observables is also symmetric up to permutations of traits of any feature, and thus independent of the "orientation"; on the other hand, the empirical, feature-level frequency distributions should heavily depend on the specific data set, thus being of little relevance for the universal patterns.
There are various aspects that make the random generation of prototypes sensible for the purpose of the present work. First, results are evaluated for various values of the number of prototypes k, which is considered a free parameter for both the PG and MPG model. Second, the expected prototype-prototype distance is controlled for via parameter α. Third, for every choice of parameters, the prototypes are independently drawn for each realized cultural state in the set used for computing the model AIVD and SIVD quantities for fitting purposes. These compensate somewhat for not inferring the prototypes from empirical data.
In order to give an example of how the sharp prototypes approximation can be pushed beyond its limits, it is worth recalling that fitting the MPG model is not possible for k = 2 prototypes, as pointed out at the end of Sec. III: the α parameter can be successfully tuned in terms of the AIVD only for small β values, which do not allow for the subsequent fitting of the β parameter in terms of the SIVD. This is related to there being at least q = 3 traits associated to every nominal feature selected from the Eurobarometer data set, while there are only two, prototype-induced peaks in the model probability distribution of every feature, on top of the uniform component. Since the integrated probability of the uniform component cannot exceed 1/k by construction, all the distributions are bound to be relatively non-uniform, such that the empirical average uniformity is only attained for small-α (few coincidences between the prototype-induced peaks) and small-β (large uniform component) combinations. This does not happen for the PG model, as in this case the integrated probability of the uniform component can attain any value between 0 and 1. Nonetheless, if k > 2, the fitting of the MPG leads to generated cultural states that reproduce much better the universal empirical patterns than PG. This justifies considering MPG the successful model, while emphasizing the importance of the mixing ingredient, which validates the multiple self assumption.
When thinking in terms of the feature-level probability distributions, it might seem that the MPG and PG models are not that different from each other. As mentioned above, for both models, if there are k prototypes, the probability distribution of a certain feature would consist of k peaks of equal probability contents and of a uniform component associated to the explicitly random trait generation. Although the probability content of the uniform component of MPG is bounded from above, that of PG is not bounded in any way, so one might think that MPG is just a particular realization of PG. However, this reasoning is misleading, as it focuses on partial information encoded in the feature-level probability distributions, disregarding the rest of the information encoded in the complete cultural space distribution. With PG, a cultural vector whose trait, with respect to a certain feature, is generated under the probability peak of a certain prototype will have its trait generated, with respect to another feature, under the well-determined probability peak of the same prototype or under the uniform component. By contrast, with MPG, a cultural vector whose trait, with respect to a certain feature, is generated under the probability peak of a certain prototype, will have its trait generated, with respect to another feature, under the probability peak of any prototype -though with a higher likelihood under the peak of the dominating prototype -or under the uniform component. Thus, for the same choice of the prototypes and the same extent of explicitly random generation of traits (and consequently the same AIVD), PG implies a different level of crossfeature correlation and a different shape of the cultural space distribution than MPG. This conceptually explains the impact of the mixing ingredient.
Although this study does not attempt at providing a complete, mathematical theory of trait dynamics and formation, one can argue that the MPG model is an effective, 3 static description of (generic snapshots of) trait dynamics. This static description is inspired by Plural Rationality Theory which, although originating in cultural anthropology, does seem to make use of notions of both psychology and of a (complex) systems based understanding of society. In view of the discussion in Ref. [5], this suggests that, at a philosophical level, this theory attains a balance between the endogenous and exogenous aspects of trait dynamics, which is arguably needed for developing a complete theory. Although it is formulated in an a qualitative, informal way, Plural Rationality Theory and related research should be of use for developing a complete theory of trait dynamics, at least as a source of guidance and inspiration.

VI. SUMMARY AND CONCLUSIONS
This study was dedicated to developing and testing a stochastic model for generating cultural states that would be structurally similar to the empirical ones. The aim was to reproduce the universal, empirical properties pointed out in Ref. [5], while relying on some social science hypothesis. Following up on previous work, the idea of cultural prototypes was used for this purpose. The study first tested the hypothesis that each cultural vector is a partial realization of one prototype and random for the rest, which is what was previously assumed. This turned out to be insufficient for reproducing the empirical patterns. Instead, one has to assume that each cultural vector is a combination, or mixture of all prototypes, although still dominated by either of them, which is what the MPG model encodes. This additional, mixing ingredient is actually suggested by the same social science theories that inspired the prototypes idea in the first place. In this specific, social science context, this aspect is often referred to as "the multiple-self". These results provide indirect evidence for social science theories like PRT, that postulate, in one way or another, some notion of cultural prototypes, along with some associated notion of mixing.
Still, there is a certain rigidity in the way prototypes are currently formalized (Sec. V), related to the assumption that every prototype corresponds to one and only one value of every cultural variable, instead of corresponding to a probability distribution over the variable. This makes the cultural space distribution induced by the successful, MPG model generally incompatible with the cultural space frequency distribution with respect to which it is fitted. As it stands, MPG is is far from being a maximum-likelihood type of model and thus cannot be used to generate synthetic data. Nonetheless, this is arguably achievable once diffuse prototypes are used instead of sharp ones, while being inferred from the data rather than randomly chosen. In this sense, this work can be seen as an important step towards a realistic, maximum-likelihood model of empirical cultural states, and towards generating synthetic sets of cultural vectors. Moreover, MPG can be considered an effective description of the outcome of trait dynamics, since the generated cultural states seem to reproduce the generic structure of the empirical ones. The LTCD-STCB analysis, used for validating this effective theory, could also be used for validating a more fundamental, dynamical theory of culture. It appears likely that Plural Rationality Theory has more to say for aiding the development of such a theory.

Acknowledgements:
The authors are grateful to Maroussia Favre for her thoughtful comments on previous versions of this manuscript. AIB also acknowledges discussions with Ulf Dieckmann, Gerard 't Hooft, Petter Säterskog, Frank Takes, Leandros Talman, Michael Thompson, Marco Verweij and Jorinde v.d. Vis. DG acknowledges financial support from the Dutch Econophysics Foundation (Stichting Econophysics, Leiden, the Netherlands). This work was also supported by the Netherlands Organization for Scientific Research (NWO/OCW). This section describes the calculation of probabilities attached to sets of cultural prototypes employed by the PG and MPG models defined in Sec. II. These probabilities are collectively controlled via a parameter (α), which effectively dictates the expectation value of the average prototype-prototype cultural distance for one set of prototypes. The assignment of traits to prototypes is conducted independently for every feature, so the discussion is reduced to assigning probabilities to prototype-to-trait mappings at the level of a single feature. Furthermore, since prototype generation neglects empirical occurrence frequencies of specific traits, the problem is symmetric with respect to permutations of the traits, so the discussion is further reduced to assigning probabilities to "topologies" of prototype-to-trait mappings at the level of a single feature. Mathematically, such a topology is an "integer partition". Integer partitions turn out to be the mathematical objects to which elementary probabilities are to be assigned. Sec. A 1 explains the procedure for assigning the probabilities to integer partitions, while Sec. A 2 explains the procedure for generating the integer partitions.

Integer partition probabilities
Let I k be the set of all integer partitions of k elements, where an integer partition of k elements is an ordered sequence of integers that add up to k, also called "parts". Let the ordered sequence (k 1 , ..., k s ) ∈ I k be one generic element of this set, where s counts the number of non-zero parts. This notation implies that the parts are sorted for descending values k i ≥ k i+1 ∀i ∈ {1, .., s − 1} and that they add up to k = s i=1 k i . For instance, (3, 2, 2, 1) is an integer partition of 8 elements with 4 parts. For the purpose of this work, an element of the integer partition corresponds to one prototype. For a specific choice of the prototypes and a specific feature, an integer partition is a representation of how the prototypes are distributed over the traits of this feature, up to a permutation of these traits. Thus, when the fraction of traits that are randomly generated vanishes, the probabilities of the traits are just the normalized part sizes -in the example above, the ordered sequence of probabilities associated to the traits would be ( 3 8 , 2 8 , 2 8 , 1 8 ). Random trait generation then simply introduces a uniform, noise component to the feature probability distribution, whose contribution increases with the fraction of traits that are randomly generated. Thus, the integer partition is in any case a proxy for the feature probability distribution, regardless of which stochastic model is used.
Let c(k 1 , ..., k s ) be the "compactness" of integer partition (k 1 , ..., k s ), defined by: which counts the number of pairs of elements belonging to the same part. For instance, the compactness of integer partition (3, 2, 2, 1) is c(3, 2, 2, 1) = 3 2 + 1 2 + 1 2 + 0 2 = 11. The compactness thus counts the prototypeprototype coincidences for one feature. In light of the above paragraph, a small compactness implies a high uniformity for the feature probability distribution and thus a high value of the associated (feature-level) AIVD contribution.
Let I q k be the set of integer partitions of k elements of at most q parts (which implies that I q k ⊆ I k ). This definition is needed for working with features with range q < k. Furthermore, let c min k,q and c max k,q be the minimal and maximal compactness values attainable by the elements of I q k . These notions are needed for normalizing generic compactness values. They formally read: where the "." (dot) notation stands for "with the property that".
At this point, it is possible to define an non-normalized probability mass function parametrized by α over the discrete set of integer partitions I q k , function whose shape would depend on α. High α values correspond to integer partitions of high compactness values being favored over those of low compactness values, while low α values correspond to integer partitions of low compactness values being favored over those of high compactness values. For simplicity, the function is chosen to be monotonous when re-expressed in terms of compactness. A simple choice for such a function, denoted here by ρ α k,q , is given by: where the inner fraction linearly maps the compactness c(λ) from interval [c min k,q , c max k,q ] to interval [−1, 1], while the argument of the tan function linearly maps α from interval (0, 1) to interval (−1, 1), from where it is further mapped to (−∞, ∞) by the tan function. In this manner, the function is increasing with c(λ) for α > 0.5 (implying a relatively low expectation value of average prototype-prototype separation), the function is decreasing with c(λ) for α < 0.5 (implying a relatively high expectation value of average prototype-prototype separation) and the function is a constant of c(λ) for α = 0.5. The actual probability P α k,q (λ) associated to integer partition λ can then be obtained via a simple normalization: with the sum in the denominator being taken over all integer partitions in I q k .

Integer partition generation
Let I d = {0 I , 1 I } ∪ I 1 ∪ I 2 ∪ ... be the set of all integer partitions of any size, together with a "null" element 0 I and a "unity" element 1 I , which are meaningful in relation to the ⊕ operation defined below and are needed for keeping some of the following definitions compact and self-consistent. Let the integer partition "merging" ⊕ : I ×I → I, acting on two integer partitions of k a and k b elements, with s a and s b parts respectively, be defined in the following way: producing another integer partition of k = k a + k b elements and s = s a + s b parts, such that the sequence of parts in the resulting partition is a sorted merging of the two original sequences of parts. For instance: (3, 2, 2, 1) ⊕ (4, 2) = (4, 3, 2, 2, 2, 1). Moreover, any integer partition λ ∈ I satisfies λ ⊕ 0 I = 0 I and λ ⊕ 1 I = λ. Let the integer partition "multi-merging" ⊗ : I × P(I) → P(I), where P(I) is the set of all subsets of I, be defined by: where α, α 1 , ..., α σ ∈ I are all integer partitions. The ⊗ operation produces a set of integer partitions of σ elements from an initial set of integer partitions of the same size and another integer partition α, by merging α with each element α i in the initial set via the ⊕ operation.
Relying on the notions above, the following recursive definition of function sip(k, m L , m V ) : N × N * × N * → P(I) encodes the procedure for generating the set of integer partitions of k elements, of maximally m L parts, with maximal part value m V : definition inspired by Ref. [20], where the order of the four cases matters, in the sense that one case is considered only if none of the conditions of the above cases is valid. The last line returns the set resulted from the reunion "∪" of all sets of integer partitions of type , where x spans the indicated interval. This general formulation, which also takes the maximal part value m V as argument, is required for a compact recursive definition. But of actual interest for this work is the set of integer partitions of k elements and maximal part value q, I q k , given by: where the last part of the expression takes out the null and/or the unity element, which might be present in the set of integer partitions as leftovers from the computation. Here we explicitly show how the sip function works when calculating the set of integer partitions of 4 elements of maximally 3 parts, given by I  This section explains the analytic calculation for the expectation value of the average inter-vector distance (AIVD) for sets of cultural vectors generated using either the PG or MPG model. The first part of this section just gives the essential formulas -Eqs. (B1) and (B2)) are common for the two models, with the difference being captured in the comparison between Eqs. B3 and B4, while the second part gives the proof Eq. (8), which is the basis for Eq. (B2).
The expectation value of the AIVD, as a function of the three model parameters k, α, β is given by the average over the feature-level expectation values: where the sum goes over all possible values ranges q and n q is the number of features with range q, with q n q = F being implicitly satisfied, where F is the number of features. Note that the feature-level contribution also depends on q. In turn, this contribution is given by: which is essentially a weighted averaging of Eq. (8) over the set of integer partitions (k 1 , ..., k s ) ∈ I q k , where the weights are the integer partition probabilities P α k,q (k 1 , ..., k s ). These are calculated in the manner described in Sec. A 1, while the integer partitions themselves are generated in the manner described in Sec. A 2. The set of p i 's of Eq. (8) depends on the integer partition in the manner illustrated between the braces of Eq. (B2), where the first term accounts for the s traits that are covered by the (non-zero) elements of the integer partition, namely those under the peak(s) of one (or more) prototype and under the flat noise component, while the second term accounts for the remaining q − s traits, namely those that are only under the flat noise component. The dependence on whether the PG or the MPG model is used is captured by π β,F , which is the average fraction of traits directly copied from prototypes, given by: for PG, where the "round" function accounts for the fact that only integer numbers of traits can be copied, and: for MPG, where w iterates over all values of W β k+1 , which is a large sequence of lowest MPG discrete weights (see Sec. II), which are numerically generated a-priori, for each used combination of (k, β) values. |W β k+1 | is the number of elements in this sequence of discrete weights. For this study, |W β k+1 | = 10 5 elements were generated for every (k, β) combination, which allows for a very precise numerical calculation of π k β,F in the case of MPG.
The consistency between the analytical AIVD calculation explained above and the numerical calculation is illustrated here via Fig. 6. The expected AIVD value is shown as a function of the β parameter, for 5 values of the α parameter and 3 values of the k parameter, for both the PG and MPG models. The analytical values are shown by the lines, while the numerical ones are shown by the dots, which have small, almost indiscernible error bars attached. For the numerical case, 50 sets of N = 500 cultural vectors with are generated for each combination of parameters. Note that the numerical profiles follow closely the analytical ones, with small deviations that are consistent with the expected fluctuations of the mean.
It is now worth presenting a proof of Eq. (8), on which Eq. (B2) is based. Consider a feature with q traits and a set of a-priori probabilities {p 1 , ..., p q } attached to them. Then, the entry of each, generated cultural vector with respect to this feature is an independent, random choice from the q traits according to the probability mass function (p 1 , ..., p q ). Thus, the expected AIVD contribution from N cultural vectors is given by: where f N , x1,...,xq p1,...,pq denotes probability that the N independent, random variables fill the q traits with the frequency distribution (x 1 , ..., x q ), given the associated probability distribution (p 1 , ..., p q ), where q i−1 x i = N . This is conventionally called the multinominal distribu-tion. In the above derivation, S i stands for the summation over all elements of the multinominal except that which has a certain, x i number of entries for the ith trait, which can be further manipulated: This shows that S i is just a term of the binomial distri-bution. By inserting the final expression of Eq. (B6) in the final expression of (B5), one gets: which concludes the proof of Eq. (8), after using the well known expressions for the first and second moments x i and x 2 i of the binomial distribution. Note that the dependence on N cancels out during the derivation.
Another, arguably shorter proof can be formulated with the aid of indicator functions of the type I i (x), which gives 1 if cultural vector x is an entry of trait i and gives 0 otherwise. One can express the feature-level AIVD of one, generic set of cultural vectors in terms of indicator functions and write the expected, feature-level AIVD as an average of this expression. The p 2 i part of Eq. (8) then appears from an averaging of the I i (x)I i (y) product, where x and y are two arbitrary cultural vectors.

Appendix C: Fitting algorithm
This section explains the procedure used for simultaneously tuning the α and β parameters of either of the two stochastic models of culture, such that a match is obtained between the model and the empirical data, in terms of the averages of the AIVD and SIVD observables: for a fixed number of prototypes k, assuming that either of the two equalities above is satisfied when there is an overlap between the uncertainty range associated to the quantity on the left side and that associated to the quantity on the right side.
There are multiple reasons why this problem is challenging: • an analytical formula for the < SIVD(α, β) > quantity could not be found (B4)) 4 , it does not allow for inverting the function and for analytically solving the system 4 Which implies that the specific uncertainty range of < AIVD(α, β) > has a null width.
• the < SIVD(α, β) >, AIVD emp and SIVD emp quantities have non-vanishing uncertainty ranges attached to them Assuming that there exists a unique solution to the above system, a numerical approach for solving it is in order. The method used here relies on a nested, 2level, adapted bisection method. The first (inner) level of the method takes care of fitting, via bisection, the first quantity for a fixed β -it finds the α value for which < AIVD(α, β) >= AIVD emp is satisfied for a given β. The second (outer) level of the method takes care of fitting, via bisection, the second quantity -it finds the β for which < SIVD(α(β), β) >= SIVD emp is satisfied, where α(β) is provided by the first level. This choice of assigning the AIVD and SIVD observables and the α and β parameters to the two levels in this manner is numerically convenient for several reasons. First, the AIVD can be much more easily computed via the analytical formula, such that assigning it to the first level, which is repeated multiple times (once for each value of β that the second level samples) is more effective. Second, the model AIVD turns out to be relatively insensitive to β for relatively many combinations of values for the k and α parameters, such that fitting AIVD in terms of α within the first level makes more sense.
In addition to adaptations required by the 2-level scheme, other adaptations with respect to the traditional bisection method are needed for allowing it to work with model and empirical uncertainties, as well as to enhance the numerical precision for the < SIVD(α, β) > quantity when needed, to the extent needed. Moreover, in addition to statistical errors originating directly in the empirical uncertainties of the AIVD emp and SIVD emp quantities and in the numerical uncertainty of the model SIVD quantity, the second level of the method is also affected by "systematic errors" on < SIVD(α(β), β) >, originating in the fitting procedure at the first level, and indirectly in the empirical uncertainty of AIVD emp -which for all practical purposes can be assumed fixed, thus motivating using the term "systematic" for its propagation to the model SIVD at the second level.
In order to address all these challenges in a self consistent way, the method developed here turns out to be quite sophisticated, which is why it is explained in detail in the following four sections. Specifically, Sec. C 1 focuses on the first fitting level, Sec. C 2 focuses on the second fitting level, Sec. C 3 describes how various subproblems invoked by the previous two sections are addressed, while Sec. C 4 describes how the tools described in Sections C 1, C 2 and C 3 are used for producing the results presented in Sections III and IV. The method is potentially of use for addressing other problems that are formally similar to the problem presented here, although certain adaptations might be needed.
Since the method has mostly an algorithmic nature, much of it is explained via pseudocode, such that a few conventions that will be extensively used below and that are not necessarily standard are worth mentioning. First, the "=" symbol is used with double meaning: in a normal statement (such as "a = b") it is to be interpreted as an assignment (of the value of variable b to variable a); in the header of an if or while statement (such as "if a = b") it is to be interpreted as a check (of whether the values of a and b are equal). A variable is implicitly declared when it first appears, either on the left side of an assignment or in the header of a function definition (in which case it is also called an argument or function parameter); the scope of the variable is the part of the function below and to the right of the place where it first appears. Functions are distinguished from each other through their names, their numbers of arguments and the types of those arguments 5 On the other hand, the arguments of a function are distinguished from each other via their order. Some variables are actually ordered sequences of other variables, which in turn are denoted by (x 1 , .., x n ) notation. In the same spirit, assignments of the type X = (x 1 , .., x n ) are referred to as "variable compressions", while those of the type (x 1 , .., x n ) = X are referred to as "variable decompressions". These allow for keeping the pseudocode compact, while still rigorous. An uncertainty range refers to an interval [x−δx, x+δx], where x is a mean and δx is an error relying (directly, or indirectly) on a standard mean error calculation, the uncertainty range being formally encoded by the sorted (x, δx) sequence. Note that the square brackets "[,]" are consistently used to denote an interval of real numbers, while the round brackets "(,)" are used to denote an ordered sequence of two or more elements. Finally, it is worth noting that the pseudocode relies heavily on function calls and on recursive definitions, and that there is a certain parallelism between the functions defined in section C 1 and C 2.

First level fitting
This section presents the algorithm part concerned with the first fitting level. The algorithm is split in three main functions: Fit-1, Bisect-1, Displace-1, all of them returning the same type of information. Fit-1 always calls Bisect-1, while the latter may or may not call Displace-1 at any stage, which in turn may or may not call Bisect-1. The pseudocode also invokes a two constants, which are assumed to be known a-priori and available for use anywhere in these three functions. First, δα controls the desired resolution (δα is essentially a grid-spacing) in the α parameter, which is here set to the inverse of the number of features: δα = 1 F 6 . Second, AIVD emp is the AIVD uncertainty range for the empirical data.
Function Fit-1 acts as an interface for the first-level fitting, which consists of tuning the α parameter, for given values of β and k, such that the AIVD quantity matches the empirical value. Here, β is a real number belonging to [0, 1] while k is a strictly positive integer number. The method returns the left (α L ) and right (α R ) margins of the tightest α interval found, together with the estimated α match within this interval assuming linearity (α fit ) and an associated error (α err ). It assumes that the empirical AIVD can actually be uniquely matched by varying α, for the given values of β and k. The method essentially carries out some initializations (Lines 2,3), before passing the task to Bisect-1. return Bisect-1(α L , α R , AIVD L , AIVD R , β, k) 5: end function Function Bisect-1 is mostly a typical, recursive implementation of the bisection method. This sequentially narrows down the [α L , α R ] interval, such that at each stage the empirical AIVD is contained, namely: min(AIVD L , AIVD R ) < AIVD emp < max(AIVD L , AIVD R ) is satisfied, where the AIVD L , AIVD R values correspond to the left and right margins of the α interval. Here, α L , α R , AIVD L , AIVD R are real numbers belonging to [0, 1] while β and k are of the same type as in Fit-1. It returns the same type of information as Fit-1. The method converges, the fitting being considered complete, when the interval has reached the δα resolution limit, in which case estimations for an "ideal" α inside this interval α fit and its error α err are made and returned together with the boundaries of the interval (lines [3][4][5][6]. Moreover, the method may also call Displace-1 in case the AIVD M value corresponding to the computed midpoint α M happens to fall within the AIVD emp uncertainty range (lines 8-10) -this is needed in order to keep the output format consistent and the final α interval relatively narrow. Otherwise, the method decides to zoom in (by calling itself) on either the left or right halves of the interval, depending on the position of AIVD emp with respect to AIVD L , AIVD M and AIVD R (lines 11-16). return Bisect-1(α L , α R , AIVD L , AIVD R , β, k) zooming in on selected interval 17: end function Function Displace-1 attempts to displace the midpoint α M previously calculated at some stage in Bisect-1, in a way that its associated AIVD would fall outside the empirical uncertainty range. This function has all the arguments of Bisect-1 and α M as an additional one, which is a real number belonging to [0, 1]. It returns the same type of information as Fit-1. The method first computes a "secondary" midpoint α M to the left of α M and its corresponding AIVD M value. If the resolution limit δα is not reached and AIVD M falls outside the AIVD emp range, Bisect-1 is applied further to the [α M , α R ] interval (lines 2-11). Otherwise, the analogous procedure is applied on the right side (12)(13)(14)(15)(16)(17)(18)(19)(20)(21). If the procedure fails to provide a convenient, secondary midpoint on either side, the fitting is considered complete with the current [α L , α R ] interval and the α fit , α err estimates made like in Bisect-1 (lines 22-23). (α fit , α err ) = InternFitLin-1(α L , α R , AIVD L , AIVD R , AIVD emp ) 23: return (α L , α R , α fit , α err ) fitting complete 24: end function

Second level fitting
This section presents the algorithm part concerned with the second fitting level. Each of the three functions of the first fitting level (Sec. C 1) has a correspondent here: Fit-2, Bisect-2, Displace-2, all of them returning the same type of information 7 , each of them having a similar, structure, purpose and role to the correspondent within the first fitting level. Additionally, this section presents the pseudocode for a fourth function, NumSIVD, which carries out the numerical SIVD calculations. In addition to the two constants introduced at the first level, the second level pseudocode invokes two other constants, which are also assumed to be known apriori and available for use anywhere in these four functions. First, δβ is the desired resolution in the β parameter, which is here set to the inverse of the number of features: δβ = 1 F . Second, SIVD emp is the SIVD uncertainty range for the empirical data.
In relation to the first three functions, the descriptions below attempt to mostly emphasize the elements that 7 The type of information returned by the three functions at a second-level fitting is different than that of the three functions at the first-level fitting, and actually more complex.
come in addition with respect to their first-level correspondents. Some of these elements have a repetitive nature and are worth explaining before moving to the specific description of each function. First, the (generic)β X notation (where "X" can stand for "L", "R" or "M") denotes the (generic) "composite fitting information"β X = (β, α L , α R , α fit , α err ) X , which is a 5-tuple consisting of a β value together with the associated four values returned by a (generic) call Fit-1(β, k) for that specific β and some arbitrary k. Second, whenever an "SIVD X " variable appears in the first three functions (where "X" is again a generic label), except for SIVD emp , it actually denotes the (generic) "composite SIVD information" SIVD X = ((SIVD fit L , SIVD err L ), (SIVD fit R , SIVD err R )) X , which is a pair of pairs of real numbers, each inner pair corresponding to a model SIVD uncertainty range associated to one margin of an α interval returned by a call to Fit-1, while both inner pairs have the same β. This schematically reads: Third, any (generic) call NumSIVD(β, k) is necessarily preceded by an associated (generic) call Fit-1(β, k) and by an associated (generic) variable compressionβ = (β, α L , α R , α fit , α err ), the last two being needed for pro-ducing the composite fitting informationβ. Fourth, whenever a piece of composite SIVD information appears in a call to Ord-2 or Match-2, it is accompanied by an associated piece of composite fitting information, which allows for the mean, statistical error and systematic error of in the model SIVD to be all reconstructed within, for a given combination of β and k.
Function Fit-2 acts as an interface for the second-level fitting, which consists of tuning the β parameter, for a given value of k, such that the SIVD quantity matches the empirical value, relying on an underlying tuning of the α parameter in terms of the AIVD quantity (using Fit-1). Here, k is a strictly positive, integer number.
The method returns the composite fitting information associated to the left (β L ) and right (β R ) margins of the tightest β interval found, together with the estimated β match within this interval (β fit ) and its associated error (β err ). It assumes that the empirical SIVD can actually be uniquely matched by varying β and α, for the given value of k. After checking that there exists a meaningful [β L , β R ] interval for which the first-level fitting is possible (lines 2,3), the method conducts the numeric SIVD calculations on both sides of the interval (line 6), preceded, on each side, by the first level fitting and the decompression (lines 4,5, as explained above), in order to finally pass the task to Bisect-2.
1: function Fit-2(k) 2: initializing the β-interval 3: if β L < β R then 4: return FittingImpossibleError 10: end function Function Bisect-2 is another recursive implementation of the bisection method, which sequentially narrows down the [β L , β R ] interval, such that at each stage the empirical SIVD is contained. Here,β L ,β R are 5-tuples of real numbers encoding the left and right pieces of composite fitting information, SIVD L , SIVD R are the pairs of pairs of real numbers encoding the left-β and right-β pieces of composite SIVD information, while k is of the same type as in Fit-2. It returns the same type of infor-mation as Fit-2. Like Bisect-1, the function consists of a part concerned with convergence (lines 4-7), a part concerned with the jump to Displace-2 (lines 11-13) and a part concerned with choosing between the left and right β subintervals and with zooming in on the chosen one (lines [14][15][16][17][18][19]. Note the additional statements concerned with decompressing the composite fitting information (line 2) and with preparing the numeric SIVD calculations at the midpoint (lines 8-9).
1: function Bisect-2(β L ,β R , SIVD L , SIVD R , k) 2: if ¬Distinct(β M , β L , β R ) then 5: (β fit , β err ) = InternFitLin-2(β L ,β R , SIVD L , SIVD R , SIVD emp ) 6: return (β L ,β R , β fit , β err ) return Bisect-2(β L ,β R , SIVD L , SIVD R , k) zooming in on selected interval Function NumSIVD numerically generates a piece of composite SIVD information with a precision that is as high as possible. Here,β is a 5-tuple of real numbers encoding a composite fitting information, while k is a positive integer number. One sequence of SIVD values is numerically generated (lines 4 and 11) for each of the two margins of the α interval (contained inβ), for the given β (also contained inβ) and the given k. An uncertainty range is obtained from each of the two sequences (lines 5 and 13). These two uncertainty ranges are used together with the information inβ to produce estimates for an average, a statistical error and a systematic error that areβ-specific rather than (α, β)-specific (lines 6,7 and 14,15). The number of SIVD values in the two se-quences is increased and the calculations are repeated as long as the condition in line 9 remains true, namely as long as: the statistical error is higher than the systematic error, the desired separation between the model and empirical (statistical) uncertainty ranges is not reached and the maximal SIVD sequence length is not reached. The desired separation and the SIVD sequence length are controlled via variables s and n, initialized in line 2 -the initial values of these variables, as well as the upper bound on the latter are hard-coded, as visible in the pseudocode, and have been decided after some experimentation with NumSIVD, but they are not essential for the actual outcome. Also note the decompression of the composite fitting information (line 3) and the decom-pression of SIVD uncertainty ranges (lines 8 and 16). (β, α L , α R , α fit , α err ) =β 4: SIVD seq L = GenSeqSIVD(α L , β, k, n); SIVD seq R = GenSeqSIVD(α R , β, k, n) 5: SIVD syst = CompSystErr(α L , α R , α err , SIVD L , SIVD R ) 8: (SIVD avg , SIVD stat ) = SIVD; (SIVD avg emp , SIVD stat emp ) = SIVD emp 9: while SIVD stat > SIVD syst ∧ (SIVD stat + SIVD stat emp > |SIVD avg emp − SIVD avg |/s) ∧ n < 350 do (SIVD avg , SIVD stat ) = SIVD 17: end while 18: return (SIVD L , SIVD R ) 19: end function

Used functions
This section describes functions that are used by the pseudocode in sections C 1 or C 2 but are not described there. The following is a list of functions for which the pseudocode is also provided, following each text description.
Function InterfitLin-1 fine-tunes the α parameter such that AIVD emp is matched, relying on a linear ap-proximation of the model AIVD as a function of α within the (α L , α R ) interval, using the boundary values AIVD L and AIVD R . Its arguments are of the same type as those of InterFitLin (described below), except that AIVD L and AIVD R are real numbers rather than uncertainty ranges. The output structure is entirely the same as that of InterFitLin. It is essentially a first-level fitting interface for InternFitLin, which is called after specifying that the errors associated to AIVD L and AIVD R are zero. return InternFitLin(α L , α R , AIVD L , AIVD R , AIVD emp ) 4: end function Function InterfitLin-2 fine-tunes the β parameter such that SIVD emp is matched, relying on a linear approximation of the model SIVD as a function of β within the [β L , β R ] interval, using the boundary information stored in SIVD L and SIVD R . Its arguments are of the same type as those of InterFitLin (described below), except thatβ L andβ R are 5-tuples or real numbers rather than real numbers and SIVD L and SIVD R are pieces composite SIVD information rather than uncertainty ranges. The output structure is entirely the same as that of InterFitLin. It is essentially a second-level fitting interface for InternFitLin, which is called after carrying out the following two operations: computing the mean, statistical error and systematic error on each of the two margins of the β interval, using the right combination of composite fitting information and composite SIVD information (lines 2,3); compressing information into an SIVD uncertainty range for each of the two margins, after choosing the highest among the two errors for each margin. (SIVD avg L , SIVD stat L , SIVD syst L ) = MeanStatSyst(β L , SIVD L ) 3: SIVD L = (SIVD avg L , max(SIVD stat L , SIVD syst L )) 5: return InternFitLin(β L , β R , SIVD L , SIVD R , SIVD emp ) 7: end function Function Match-1 checks whether AIVD (real value) falls within the uncertainty range specified by AIVD emp .
It acts as an interface for Match (described below) within the first-level fitting scheme.  return Ord(AIVD L , AIVD R ) 3: end function Function Ord-1 (second version) checks whether AIVD (real value) is smaller than the average stored in AIVD emp (uncertainty range), acting as an interface for Ord within the first-level fitting scheme. return Ord(AIVD, AIVD avg emp ) 4: end function Function Ord-2 (first version) checks whether the average stored in the SIVD uncertainty range obtained from β L (composite fitting information) and SIVD L (composite SIVD information) is smaller than the average stored in that obtained fromβ R (composite fitting information) and SIVD R (composite SIVD information), acting as an interface for Ord within the second-level fitting scheme. (SIVD avg R , SIVD stat R , SIVD syst R ) = MeanStatSyst(β R , SIVD R ) 4: return Ord(SIVD avg L , SIVD avg R ) 5: end function Function Ord-2 (second version) checks whether the average stored in the SIVD uncertainty range obtained fromβ (composite fitting information) and SIVD (com-posite SIVD information) is smaller than the average stored SIVD emp , acting as an interface for Ord within the second-level fitting scheme.  (β, α L , α R , α fit , α err ) =β (SIVD avg , SIVD stat ) = SIVD 7: return (SIVD avg , SIVD stat , SIVD syst ) 8: end function The following is a list of functions for which only text explanations are provided in schematic way, sometimes accompanied by figures.
• Init-1(δα): gives the left and right boundaries of the largest possible interval for which the α parameter is compatible with the stochastic model in use, given the grid spacing δα input: δ is a real number in practice it returns (δα, 1 − δα) regardless of whether PG or MPG is used • Init-2(δβ, k, AIVD emp ): gives the left and right boundaries of the largest possible interval, if any, for which the β parameter allows for the (first level) fitting of AIVD(α) to successfully take place, given the grid spacing δβ input: δβ is a real number, k is a positive integer and AIVD emp is an uncertainty range assumes that there exists at most one β interval [β L , β R ] for which there exists an α such that < AIVD > k α,β = AIVD emp is satisfied starts from the largest interval allowed by the model and independently adjusts each of the two boundaries via a branching algorithm, until the desired interval is reached returns two (incompatible) boundaries β L > β R if such an interval does not exist • Middle(l, r, δ): computes the value closest to the average between l and r, on a grid of spacing δ input: l, r, δ are all real numbers assumes that the interval length l − r is equal to an integer times δ estimates the mean and error in SIVD corresponding to α fit based on the values attained for α L input: α L , α R , α fit are real numbers, while SIVD L , SIVD R are mean-error pairs of real numbers uses on a linear interpolation within the [α L , α R ] interval, separately for the mean and for the error • CompSystErr(α L , α R , α err , SIVD L , SIVD R ) estimates the systematic error SIVD syst of the SIVD quantity induced by the error α err (associated to fitting the α parameter in terms of the AIVD quantity), assuming that SIVD is a linear function of α within the [α L , α R ] interval input: α L , α R , α err are real numbers while while SIVD L , SIVD R are mean-error pairs of real numbers encoding the theoretical uncertainty ranges on the left and right boundaries -SIVD syst is computed based on geometrical considerations, in the manner illustrated in Fig. 7(b)

Algorithm usage
This section explain how the formalism explained throughout section C is effectively used for producing the results shown in Sections III and IV.
First, the formalism is used for producing the plots shown in Fig. 3. For either PG or MPG, for a specific k value and a specific β on-grid value, the drawn model SIVD uncertainty range is obtained after the following computational steps: 1: (α L , α R , α fit , α err ) = Fit-1(β, k) executing 1st-level fitting 2:β = (β, α L , α R , α fit , α err ) creating composite fitting information 3: SIVD = NumSIVD(β, k) numeric SIVD calculations 4: (SIVD avg , SIVD stat , SIVD syst ) = MeanStatSyst(β, SIVD) which provides the values of the SIVD average SIVD avg , the SIVD statistical error SIVD stat and the SIVD systematic error SIVD syst . One can then place a point at coordinates (β, SIVD avg ), within the respective k curve, with an error bar given by the maximum between SIVD stat and SIVD syst .
Second, the formalism is used for providing the bestfitting, on-grid values for the α and β model parameters, which are used for generating sets of cultural vectors on which the LTCD-STCB analysis is applied in Fig. 4. For either PG or MPG and for a specific k value, the following procedure is followed: 1: (β L ,β R , β fit , β err ) = Fit-2(k) Executing 2nd-level fitting 2: (β L , α L L , α R L , α fit L , α err L ) =β L Decompressing left-β composite fitting information 3: (β R , α L R , α R R , α fit R , α err R ) =β R Decompressing right-β composite fitting information 4: if β fit − β L < β R − β fit then if α fit R − α L R < α R R − α fit R then 14: α = α L R choosing α L R , since it is closer to α fit