From Topic Networks to Distributed Cognitive Maps: Zipfian Topic Universes in the Area of Volunteered Geographic Information

Are nearby places (e.g. cities) described by related words? In this article we transfer this research question in the field of lexical encoding of geographic information onto the level of intertextuality. To this end, we explore Volunteered Geographic Information (VGI) to model texts addressing places at the level of cities or regions with the help of so-called topic networks. This is done to examine how language encodes and networks geographic information on the aboutness level of texts. Our hypothesis is that the networked thematizations of places are similar - regardless of their distances and the underlying communities of authors. To investigate this we introduce Multiplex Topic Networks (MTN), which we automatically derive from Linguistic Multilayer Networks (LMN) as a novel model, especially of thematic networking in text corpora. Our study shows a Zipfian organization of the thematic universe in which geographical places (especially cities) are located in online communication. We interpret this finding in the context of cognitive maps, a notion which we extend by so-called thematic maps. According to our interpretation of this finding, the organization of thematic maps as part of cognitive maps results from a tendency of authors to generate shareable content that ensures the continued existence of the underlying media. We test our hypothesis by example of special wikis and extracts of Wikipedia. In this way we come to the conclusion: Places, whether close to each other or not, are located in neighboring places that span similar subnetworks in the topic universe.


INTRODUCTION
In this article, we explore crowd-sourced resources for automatically characterizing geographical places with the help of so-called topic networks. Our goal is to model the thematic structure of corpora of natural language texts that are about certain places seen as thematic frames. is is done in order to automatically compare the thematic structures of corpora of texts about these places, which will be represented as topic networks. In this way we want to investigate the regularity or systematicity according to which geographical objects (i.e. cities and regions) are dealt with, especially in online communication.
Our work relates to what is described by Crooks et al. [27] as a novel paradigm of modeling "urban morphologies". We not only add special wikis such as regional and city wikis as candidates to the resources listed in [27]. Rather, we also introduce a novel method for modeling their content.
is concerns local media of collaborative writing about places [cf. 26] which contain everyday place descriptions [24] authored and networked according to the wiki principle. e corresponding wikis and the subgraphs of Wikipedia that we additionally analyze manifest Volunteered Geographic Information (VGI) [50,51,57] and thus relate to what is called the wiki cation of Geographical Information Systems (GIS) [131]. VGI is "completing traditional authoritative geographic information" [71], an information source which is still "underutilized" in geography [124] as a source of big textual data [71] making natural language processing an indispensable prerequisite for its analysis. According to Hardy et al. [57] authoring VGI has a spatial component in the sense that people likely write about local content though this also holds for Wikipedia for a minor degree [60]. is spatial component can be accompanied by a lack of quality assurance, which makes VGI susceptible to de ciencies and to a distorted resource of still unknown extent [51]. In any event, the biased coverage of VGI is a characteristic of resources like Wikipedia so that the same region can be displayed very di erently in its various language editions [53], a sort of biasing which is typical for user generated content. Nevertheless, Hahmann & Burghardt [54] show that more than 50% of the articles in the German Wikipedia contain geo-referenced data (at least indirectly via links to other articles), so that such media can be regarded as rich resources of VGI. Moreover, Goodchild & Li [51] point to the fact that crowd-sourcing or, more precisely, crowd-curation [70], as enabled by wikis, is a means of quality assurance.
We follow this concept and assume that geographic data, as manifested linguistically in online media, are a valuable resource to investigate how communities form a common sense for addressing places of common interest. In line with Davies [29,41] we additionally assume that "[a]s people communicate more about a place, social consensus will create increased similarity between and within people's judgments of it." However, we also assume that the la er similarity can a ect communications of di erent communities about di erent places. In this way, we assume a kind of horizontal self-similarity [100] of the thematic structure of online media, which is more or less independent of the underlying theme and the community. at is, our hypothesis on the theming of places is as follows: ematizations of di erent places at a certain level of thematic abstraction tend to be similar among each other (rather than being dissimilar) in the sense that (1) they focus on similar topics, (2) the way these topics are networked and (3) with respect to the skewness of this focus, regardless of whether the underlying media are generated by di erent communities and whether these communities address related or unrelated places at near or distant spaces. e intuition behind Hypothesis 1 is that thematizations of places in web-based communication are seemingly somehow thematically redundant: In reporting, for example, on the cities in which people live, they may aim to emphasize the special character of these places. It seems, however, as if a thematic trend is breaking ground that ultimately makes such reports appear thematically very similar. Whether or not this intuition is actually a trend that can be observed speci cally in the eld of wiki-based media is something this study is intended to clarify. From this point of view, it is obvious that Hypothesis 1 is only a starting point which in itself needs further clari cation in order to be testable: similarity, for example, is a highly context sensitive a ribute [94] that needs further de nitional speci cations in order to be computable. Likewise the concept of thematization (theme or topic) -a concept which according to [3] has so far found comparatively less a ention in linguistics -is not yet speci ed in Hypothesis 1. us, an appropriate elaboration and concretization of Hypothesis 1 is one of the main tasks of the present paper. To this end, it is developing a generic topic network model in conjunction with a measurement procedure which will specify both the notion of similarity (which will be de ned in terms of the graph similarity of topic networks) and of the thematization of places (which will be de ned in terms of topic labeling and topic networking). is topic network model will allow Hypothesis 1 to be reformulated and concretized in the form of variants (i.e., hypotheses 2, 3 and 4), which will be presented in the third part of the paper (in Section 3.2.7) and whose formulations presuppose the topic network model that this paper develops in the preceding sections. e skewness that is mentioned by Hypothesis 1 reminds one of a Zip an process, according to which a few topics dominate, while the majority of candidate topics is underrepresented or  [90] saying that language encodes geographical information: the places p, q are expressed in the discourses x, , from which the topic representations α, β are computationally derived. Places are structured into systems of networked rhemes or subtopics. The conceptual relatedness of p and q is grounded in the relatedness of the rhemes p i and q m and modeled by the relatedness of the derived topics α and β modeling these rhemes. According to the semiotic triangle, we assume that the relation of signs (here: texts) to their referents (here: spaces) is mediated by sign processes. We use dashed arcs to express the indirect relation of the former to the la er. In lexical variants of this approach, p and q are preferably denoted or described by some words w k , w k +l of the underlying lexis, which are syntagmatically or paradigmatically associated and modeled by some types , w. Framed numbers indicate relations that potentially parallelize each other. s.r. means statistically related.
disregarded. erefore, we speak of Zip an thematic universes, which are spanned by the thematization of the same places in online media such as special wikis of the sort studied here. By the term topic we refer to the notion of aboutness of texts [3,143]. From a linguistic point of view, the terminology of Hypothesis 1 seems to be confusing when referring to places as what is given and with topic to what is said about these places. e reason is that linguistics distinguishes between what is given (theme or topic) and what is said about it (rheme, comment or focus) in a given piece of text [3,20,28,65]: a mention of a city like Vienna, for example, can be connected with certain subtopics (e.g. classical music), which characterize this place rhematically by providing new information about it. e la er distinction is meant when we relate subtopics in the role of rhemes to places in the role of topics in the linguistic sense. us, when talking about topics as part of a computational model, we will use the term topic (topic 2 ), while when talking about places as topics in the linguistic sense (topic 1 ), we will use the term theme and speak about its rhemes as its subtopics modeled by topics (topic 2 ) as units of our model. is scenario and its relation to Hypothesis 1 is depicted in Figure 1. It shows a generalization of a hypothesis of Louwerse & Zwaan [90] according to which language encodes geographical information: the places p and q, which are understood as conceptual units (i.e. mental models), are described by or expressed in two discourse units (texts, dialogs etc.) x and . From the la er units, the topic representations α and β are derived by means of a computational model (e.g., Latent Dirichlet Allocation (LDA) [15] or the topic network model introduced in Section 3). While such derived topics are part of the computational model, the underlying discourses belong to the modeled system. We assume that the conceptual unit p (q) is structured into a system of networked rhemes or subtopics p i (q m ). Ideally, the derived topic α in Figure 1 is a valid model of one of the rhemes of place p (e.g. p i ) and β of one of the rhemes of place q (e.g. q m ). If we assume now that p and q are conceptually related (e.g. similar) to each other, then the linguistic encoding hypothesis implies that this is possibly re ected by a relatedness (e.g. similarity) relation among some rhemes of these places (e.g. by the relatedness of p i and q m ). From the point of view of modeling, this relation is ideally mapped by the relatedness (e.g. similarity) of the derived topics α and β. We assume that conceptual relations between places can be parallelized by relations of physical proximity or distance between spaces that are mentally modeled by these places. If one additionally assumes that proximity in space correlates with relatedness in conceptual space (the less distant, the more similar, for example), one obtains a linguistic variant of Tobler's so-called rst law (see Section 2). If we look at the literature (see Section 2), we nd that the approaches in this area di er in terms of the linguistic level at which they observe the linguistic encoding of platial [70] relations: for example, at the level of intertextually linked texts, at the level of the topics these texts are about, or at the level of lexical elements used by these and other texts to deal with the la er topics. In lexical variants of this approach, the places p and q, for which we assume that they are conceptually related, are preferably referred to or described by means of lexical items w k , w k+l (see Figure 1) of the underlying lexis that are syntagmatically or paradigmatically associated. From the point of view of modeling, we have then to assume two types , w (as models of the words w k , w k+l ) for which we automatically detect, for example, their (paradigmatic) closeness in semantic space [cf. 30,121] or the similarity of their (syntagmatic) co-occurrence statistics [cf. 89]. From this analysis we obtain a series of reference points or means for encoding geographical information about conceptual relations (see [1] in Figure 1) of places. is concerns more precisely a series of possible parallelizations of such relations, which may ultimately be parallelized by relations between the spaces designated by these places (for the numbers in brackets see Figure  1): at the level of the modeled system, this refers to thematically linked rhemes, intertextually linked discourse units (e.g. texts) and to syntagmatically or paradigmatically linked words ( [1]). From a modeling point of view, we distinguish the statistical relatedness of types or of topics as candidate parallelizations ( [1]). Beyond that we nd the parallelization of the relatedness of rhemes and words on the one hand and of types and topics on the other ( [2], [3]) as well as that of the relatedness of words on the one hand and of types on the other ( [4]). e parallelization of the relatedness of rhemes of the same place ([0]) by the relatedness of the rhemes of another place concerns the core of our network approach. Such relations among rhemes constitute rhematic networks or networks of rhemes on both sides of the a ected places. Our main assumption is now that any such rhematic network, which manifests the thematic structure of a place, can be related as a whole to that of another place. In doing so it is, from a modeling point of view, ideally parallelized by the structural relatedness (e.g. similarity or complementarity) of topic networks, which are derived from corpora of texts, each of which describes one of these places ( [5]). is type of parallelization a ects entire networks of linguistic objects, and yet o ers a means of encoding the conceptual relationship of places ( [1]) or the proximity of spaces, respectively. In the present paper we explore relations of Type [5] in order to learn about the encoding of geographical information in natural language texts, that is, about relations of Type [1]. To this end, we develop, instantiate and empirically test a formal model of multiplex topic networks derived from so-called linguistic multilayer networks as a model of relations of Type [5].
From this point of view, Hypothesis 1 means that certain rhemes of places and the structure they span resemble each other, regardless of how far the quanti ed distances of the spaces represented by these places are and regardless of the fact that the texts in which these rhemes are described are wri en by di erent communities. To test this hypothesis, we introduce topic networks to make the networking of topics a research object according to the scenario described in Figure 1, that is, in relation to the hypothesis of linguistic encoding of geographical information. e contributions of this article are of theoretical, methodical and empirical nature: (1) Formal modeling: We develop a generic, extensible formalism for the representation of topic networks that covers a wide range of informational sources for spanning and weighting topic links. To this end, we introduce the notion of multiplex topic networks derived from socalled multilayer linguistic networks. In this way we enable the same place to be represented by a family of thematic networks that o er di erent perspectives on the networking of its rhemes. We exemplify this model by means of two perspectives provided by so-called Text Topic Networks (TTN) and their corresponding Author Topic Networks (ATN). (2) Procedural modeling: we develop a measurement procedure for instantiating our formal model. To this end, we introduce novel measures of the similarity of labeled graphs that are sensitive to their links and to their nodes. (3) Experimentation: We further develop the range of baseline statistics in network theory in order to be er assess the quality of our measurements. To this end, we test our model by means of a threefold classi cation experiment that compares a set of TTNs with each other, a set of corresponding ATNs with each other and the former TTNs with the la er ATNs. (4) eory formation: We interpret our ndings in the context of cognitive maps, thus building a bridge between our network-theoretical approach and approaches to the cognitive representation of geographical information. We show how to integrate the analysis of entire networks into the research about the linguistic encoding of geographical information (see Figure 1). e paper is organized as follows: Section 2 discusses related work. Section 3 introduces our formal model of linguistic multilayer networks and of the multiplex topic networks derived from them. Section 4 describes our experiments in detail and Section 5 discusses our ndings. Finally, Section 6 concludes and gives an outlook on future work.

RELATED WORK
Our work is related to linguistic research on Tobler's [132] rst law (TFL) which says that "[. . . ] everything is related to everything else, but near things are more related than distant things." [132,236]. Due to its underspeci cation, this so-called law raised many questions about what it means to be related or distant [104]. Accordingly, a range of approaches exist that make di erent proposals to interpret relatedness also in terms of semantic relatedness. In the context of information visualization, Montello et al. [106] test a variant of TFL called the rst law of cognitive geography which says that "people believe closer things to be more similar than distant things" [106,317] where spatial distance is referred to for judging the similarity of information objects. is approach is contrasted with a study by Hecht & Moxley [59] who model relations of Wikipedia articles as a function of the probability of being linked in the web graph and nd that this probability is related to the geographical distance of toponyms described in the articles. Hecht & Moxley relate their nding to the transitivity of networks by stating that the smaller the geographical distance of nodes, the higher their clustering coe cient [59,101]. is work is extended by Li et al. [85], who calculate semantic relationships of articles instead of hyperlinks and show that TFL holds independently of the geographical domain up to a certain distance threshold. A lexical variant of TFL is mentioned by Yang et al. [144], according to which geographically close words tend to be clustered into the same geographical topics. is phenomenon has earlier been studied by Louwerse et al. [cf. the review in 89] who reformulate Firth's famous dictum by saying that "[. . . ] you shall know the physical distance between locations by the lexical company they keep." [89,1557]. is means that the distance of places correlates with syntagmatic associations between the lexical items used to describe them. at is, language encodes geographical information [90] at least regarding the distances of semantically related places. From this perspective, TFL appears to be reformulated as a candidate for a geolinguistic law that is compatible with the more general Symbol Interdependency Hypothesis (SIH) [88]. According to SIH, linguistic information encodes perceptual information so that the former serves as a shortcut to the la er [88]. Finally, a rather text-linguistic variant of TFL is proposed by Adams & McKenzie [2], which states that near places are each described by texts whose topics are more similar than in the case of texts about distant places.
In contrast to these approaches, we hypothesize that places, no ma er how far apart, have similar topic distributions when their descriptions are transmi ed by media such as city and region wikis. If we nd evidence for this hypothesis, there are various candidates for explaining it: rstly, such a nding could indicate a trivial meaning of TFL [cf. 104] in relation to the topics modeled by us, implying that everything, distant or not, is highly related. Secondly, it could indicate the (in-)e ectiveness of distances and similarities at di erent scales: at the level of local, speci c topics (within the scope of TFL) and at the level of global, more general topics (outside the scope of TFL).
irdly, such a nding could indicate a hidden similarity of processes of collaboratively writing wikis about di erent places, even if the wikis are wri en by di erent communities (see Hypothesis 1). In order to decide between these alternatives, we need a new topic model that derives networks of thematic structures at di erent scales from texts in online media about the same places. is should at least include the networking of topics along relations of intertextuality and co-authorship in order to allow for revealing similarities of the underlying processes of collaborative writing. To this end, we will develop multiplex networks that integrate text-and author-driven topic networks.
So far, most approaches to thematic aspects of places use topic modeling based on Latent Dirichlet Allocation (LDA) to associate topics and texts about geographical units, where topics are represented as sets of thematically related words. An early approach in this regard is described by Mei et al. [102] who model spatio-temporal theme pa erns to identify dominant topics in texts that are connected to places. A related approach is proposed by Hao et al. [56], who aim to detect topics that are "localized" in places. is is done to ground their similarities in relations of their thematic representations -a scenario that is omnipresent in linguistically motivated work in the context of TFL (cf. Figure 1). Likewise, Adams & McKenzie [2] extract topic models from travel blogs to detect topics as groups of semantically related words associated to places, so that relations among places can be identi ed by shared topics. Another example is proposed by Bahrehdar & Purves [6]: instead of documents wri en by individual authors, they analyze tagging data extracted from image descriptions in Flickr. A hybrid model of topic modeling comes from Yin et al. [145], in which representations of regions are used instead of documents to link topics to places. A related region-topic model that uses regions as topics to map words, sentences and texts to distributions of regions or to ground them semantically [cf. 120], is proposed by Speriosu et al. [128]. A promising extension is developed by Gao et al. [44] who aim at detecting higher-level functional regions as semantically coherent areas of interest. To this end, they analyze co-occurrence relations between topics to describe many-to-may relations of locations and urban functions. Another direction is pursued by Lansley & Longley [80], who investigate the location-and time-based distribution of topics in Twi er, se ing a number of twenty topics as a target for LDA. See also Jenkins et al. [70] who utilize a list of six high-level topic categories. One of the largest studies in this context is the one of Gao et al. [45] who present an integrative approach to modeling texts from a range of di erent media such as Wikipedia, Twi er, Flickr etc. to demarcate cognitive regions [105]. All these approaches start from topic modeling to map natural language texts onto distributions of topics in order to relate the places thematized by these texts (cf. Figure 1).
A prominent precursor of topic models [81] is given by Latent Semantic Analysis (LSA) [79]. Consequently, there are studies in the context of TFL based on this predecessor. Davies [30], for example, interprets the associations of place names computed by LSA from place descriptions as a model of the cognitive representation of the corresponding spaces [cf. 31]. is approach opens up a perspective for measuring biased cognitive representations of spatial systems: according to Davies, her approach provides representations of cognitive geographies that are explored by the associations of semantically close place names in accordance or not with the underlying geographical relations, that is, in accordance or not with TFL [cf. 120]. ese and related studies produce interesting results about the localization of topics or vice versa about the thematization of places in texts. However, they mostly disregard topic networking, not to mention the networking of topics viewed from di erent angles. Although it is easy to derive a network approach from binary relations of topic similarity, relationships that cannot be traced back to sharing similar words are hardly mapped by topic models of the sort considered so far. By generating topic distributions per location, for example, we know nothing about the dynamics of the co-authorship of the underlying texts: in the extreme case one observes (dis-)similarities, which result from the activity of a small number of authors or even only one author -in contrast to the assumed collaboration density of online media such as Wikipedia. erefore, it is our goal to develop a model of topic networks that simultaneously addresses the dynamics of the co-authorship of the underlying texts. A subtask will be to develop a formal model of thematic networking that is generic enough to integrate a wide range of sources of networking -at least theoretically.
While most of the approaches considered so far ignore aspects of networking, a second branch of research tends to follow the paradigm of network theory. Hu et al. [68], for example, measure the semantic relatedness of cities as nodes of a city network [124] depending on the co-occurrences of city names in news articles. is approach is related to Liu et al. [87], who explore co-occurrences of toponyms to induce city networks that can be used to test predictions associated with TFL. Hu et al. [68] further develop this approach to networking cities by reference to topics of articles in which the corresponding toponyms are observed. ey use Labeled LDA [117] to learn to extract topics α from texts to nally determine the α-relative similarity of cities based on the co-occurrences of their names in texts about α. Another approach to city networks using Wikipedia as a data source, is proposed by Salvini & Fabrikant [124]: they link cities as a function of the number of articles "co-siting" [12] their Wikipedia articles. A comprehensive perspective on modeling spatial information is developed by Luo et al. [91], who propose a three-part network model that integrates representations of spatial, social, and semantic networks. In this conceptual model, semantics plays the role of interpreting behavior in spatial and social space and thus of bridging them. Although we share this hybridization of the network perspective on spatial information, we strive for a more concrete model that can be empirically tested.
Any such study has to face various aspects of the vagueness [4,105] or informational uncertainty [51] of concepts of regions [105] and places [70] and especially of the names of such entities [45].
According to Winter & Freksa [142] this includes semantic ambiguity, indeterminacy of spatial extent or boundary vagueness [45], preference-oriented re-scaling of extent and the dynamics of salience a ected by various dimensions of contrast. Beyond boundary vagueness, Gao et al. [45] speak of the shape and location vagueness by example of cognitive regions. Furthermore, Jenkins et al. [70] refer to the temporal dynamics of places as evolving concepts as a source of uncertainty. From a methodological point of view, this multi-faceted uncertainty has two implications: in relation to the model, which should be exible enough to map these facets, and in relation to the object itself, which could complicate its modeling by unsystematically distorting it.
In accordance with Hu [67] we assume that the thematic perspective complements the spatial and temporal perspective of the study of places. A rheme can be understood as the "content" of a geographical region that expands its dimensionality [105]. is content may be further speci ed in terms of a ordances, functions or shared conceptual representations associated by members of a community with the corresponding place so that di erent places can be related by being associated with similar content. is thematic perspective will be at the core of our article. To this end, we follow the approach of Jenkins et al. [70], according to which places are connected with meanings generated by collaborators of crowd-sourcing media such as Wikipedia: their collaboration creates what Jenkins et al. call platial themes, namely themes that are characteristic for certain places. As shared meanings, these platial themes ultimately create a "collective sense of place", as it is perceived by the corresponding community. In this context, Jenkins et al. [70] propose to study politics, business, education, recreation, sports, and entertainment as six high-level topics of places. However, by reference to the Dewey Decimal Classi cation (DDC) we will instead deal with more than six hundred hierarchically organized topics, each of which is manifested by a range of Wikipedia articles. In any event, we have to consider that thematic aspects may distort the conceptualization and perception of spatial objects [45]. A central question then concerns the regularity or systematicity of this distortion in the sense of asking to what extent thematic representations of di erent places show similar aspects of being biased. is question will be at the core of this article.

MULTIPLEX TOPIC NETWORKS: A NOVEL APPROACH TO TOPIC MODELING
In order to study relations of thematic preference in VGI as a manifestation of distributed cognition, we introduce Topic Networks (TN) as an alternative to Topic Models (TM) [14,15,130]. TMs are based on the idea that texts manifest probabilistic distributions of topics which are represented as probability distributions over the lexical constituents of these texts, where these distributions may be a ected by style, the underlying genre or any other (syntactic, semantic or pragmatic) criterion of text production [61,66,122]. Regardless of its success, this model is unsuitable for modeling TNs as manifestations of distributed cognitive maps because of the following problems: P1 Corpus speci city: the corpus speci city of TMs impairs comparability and transferability to ever new corpora, since the topic distributions are learned from the input corpora whose topics are to be modeled. is approach apparently cannot use a transferable topic model as a basis for representing the topics of a large number of di erent corpora. P2 Topic labeling: the corpus-speci c derivation of topic labels from the input corpora makes it di cult to compare their topic distributions. As reviewed by Herzog et al. [64], external resources can be used for this task. However, there are hardly any such resources for all possible topic combinations -unless one wants to explore an overarching system such as Wikidata making such a project considerably more di cult due to its size. e labeling problem can be addressed using, for example, Labeled LDA [117], an approach that leads us into the area of supervised classi cation, which is also followed here.  Fig. 2. Schema of mapping texts onto hierarchically organized topic networks: words, sentences and texts describing a certain thematic frame (e.g. a place as the central topic of a city wiki) are mapped onto a topic hierarchy as an example of a so-called generalized tree [33,95]. Based on kernel links of thematic specialization, the topics are organized hierarchically, whereby this organization is superimposed by up-and downward cross references. Dashed links are inferred as a result of modeling the thematic networking of input words, sentences or texts. As we assume that the underlying topic model has been trained by means of a reference corpus R (see Definition 3.2), each topic is associated with a distribution of lexical elements of R that are preferably used to manifest this topic (see the types , w in relation to the topics α, β in Figure 1). This preference relation may be extended to higher-level units such as sentences etc.
P3 Scalability: instead of dealing with corpora of equally large texts, online communication o en leads to sparse, tiny texts that sometimes consist of a single sentence, a single phrase or a single word. Regardless of the size of the text, we need a procedure that determines its topic distributions so that texts of di erent size can be compared using topic models of comparable size. Even if small texts are post-processed (a er topic modeling) in such a way that their topic distributions are derived from their lexical constituents, such an approach would nevertheless mean to exclude text snippets from the training process. P4 Rare topics: one reason to prefer training by means of corpora as large as Wikipedia is to allow for detecting topics even if they form a kind of thematic hapax legomenon in the corpora to be analyzed. If we try to identify rare topics directly from these corpora, we will probably not detect them, since by de nition these corpora do not provide enough information to identify such topics. In any event, the rarity of evidence about a topic should not be an impediment to identifying its occurrences even at the level of single sentences. P5 Methodical closeness: instead of deriving all distributions of all dependent and independent variables as part of the same topic model, one possibly wants to include di erent information sources that are computed by di erent methods based on diverse computational paradigms (e.g., ontological approaches to measuring sentence similarities, approaches to word embeddings based on neural networks, topic models, etc.). In order to enable this, we look for a methodologically open topic model that allows such di erent resources to be easily integrated. In a nutshell: We are looking for an approach that (i) allows thematic comparisons of previously unforeseen text corpora using an underlying reference corpus, (ii) o ers a generic solution to the problem of topic labeling, (iii) is highly scalable and can therefore map even the smallest text snippets to topic distributions, (iv) simultaneously takes rare topics into account and (vii) is methodologically open and expandable. Such a topic network model is now developed in two steps: in Section 3.1 we introduce the underlying formal apparatus. is is done by deriving multiplex topic networks from linguistic multilayer networks. Section 3.2 describes a method by which this model is instantiated as a prerequisite for its empirical testing.

From Linguistic Multilayer Networks to Multiplex Topic Networks
In this section, we introduce multiplex topic networks.
is is a type of network that is based on the idea of deriving the networking of topics of textual units by evaluating evidence from di erent sources of information such as text vocabulary, higher-level text components, distributed authorship or readership, genre, register or medium. Since these sources of evidence can be explored in di erent compositions, this can lead to di erent perspectives on the salience and networking of the topics addressed by the same texts. Topic networks are multiplex in precisely this respect: the di erent evidence-providing perspectives may lead to di erent topic networks that allow comparisons to be made through which di erences in the linguistic, social or otherwise contextual embedding of thematizations become visible. is concept of a multiplex topic network is now being generically formalized.
To introduce multiplex topic networks, we start with de ning linguistic multilayer networks (De nition 3.1) whose layeredness allows for distinguishing several (non-)linguistic information sources of topic networking. We refer to supervised topic classi ers trained by means of large reference corpora to tackle the challenges P1, P2, P3 and P4. Based thereon, we introduce so-called text topic networks (De nition 3.3), which evaluate intra-and intertextual relations for the purpose of topic networking. en, we introduce two-level topic networks (De nition 3.4) and exemplify them by author (De nition 3.5) and word topic networks (De nition 3.6) which explore relations of (co-)authorship and lexical relatedness, respectively, as sources of topic networking. ese notions are generalized to arrive at n-level topic networks (De nition 3.7) which are based on n > 1 informational sources of topic networking (cf. challenge P5). Finally, multiplex topic networks are de ned as families of n-level topic networks (De nition 3.8) representing the networking of the same set of topics from di erent informational perspectives and, thus allowing for mapping the thematic dynamics, for example, of descriptions of the same place.
De nition 3.1. Let X = {x 1 , . . . , x n } be a corpus of texts. A Linguistic Multilayer Network (LMN) 1 L(X , l) = (L, C) (1) is a tuple of two sets of directed graphs such that the set of kernel layers L consists of a pivotal text layer and several derivative layers, that is, a coauthoring layer, a language-systematic word layer and possibly several layers modeling the networking of constituents of the pivotal texts: (1) the pivotal text layer L 1 = (V 1 , A 1 , µ 1 , ν 1 , λ 1 , κ 1 ), also called text network, is spanned by texts of the corpus V 1 = X such that A 1 is manifesting intra-(as in the case of re exive arcs) or intertextual relations, (2) the author layer: L 2 = (V 2 , A 2 , µ 2 , ν 2 , λ 2 , κ 2 ), also called agent network, is spanned by the network of agents (co-)authoring the texts in V 1 and their social relations, (3) the lexicon layer L 3 = (V 3 , A 3 , µ 3 , ν 3 , λ 3 , κ 3 ), also called word network, is spanned by the language-systematic lexical signs (i.e., lexemes and related units) used by agents of V 2 as part of their agent lexica to author the texts in V 1 , is called a constituent layer modeling the networking of (e.g., lexical, phrasal, sentential etc.) constituents of texts x ∈ V 1 such that A i maps intra-(e.g., anaphoric) or intertextual (e.g., sentence similarity) relations, is called a contextual layer modeling the networking of units (e.g., media, genres, registers [55] etc.) of the contextual embedding of texts x ∈ V 1 such that A i maps for example relations of the switching, merging or embedding [25,139] of these contextual units, For i, j = 1..l, i j, µ i , µ i .j are vertex weighting functions, ν i , ν i .j are arc weighting functions, λ i , λ i .j are vertex labeling functions and κ i , κ i .j arc labeling functions. We say that the linguistic multilayer network L(X , l) is spanned over the text corpus X and layered into l layers.
Example 3.1. To illustrate our de nitions, we construct a minimized example. Suppose a corpus of four texts 4 , w 8 , w 9 } (for reasons of simplicity we exemplify texts as bag-of-words), that is, V 3 = {w 1 , . . . , w 9 }, V 3.1 = {w 1 , . . . , w 9 , x 1 , . . . , x 4 } and A 3.1 = {(w 1 , x 1 ), (w 2 , x 1 ), (w 3 , x 1 ), . . . , (w 4 , x 4 ), (w 8 , x 4 ), (w 9 , x 4 )}. Further, we assume four authors V 2 = {a 1 , a 2 , a 3 , a 4 } such that a 1 and a 2 co-authored x 1 and x 2 , while a 3 and a 4 co-authored x 3 and x 4 , that is, V 2.1 = {a 1 , . . . , a 4 , x 1 , . . . , x 4 } and A 2.1 = {(a 1 , x 1 ), (a 2 , x 1 ), (a 1 , x 2 ), (a 2 , x 2 ), (a 3 , x 3 ), (a 4 , x 3 ), (a 3 , x 4 ), (a 4 , x 4 )}. Further, we assume that the texts x 1 , x 2 are linked by some intertextual coherence relation (e.g. by a rhetorical relation, an argument relation or by some hyperlink) as are the texts x 3 , x 4 so that A 1 = {(x 1 , x 2 ), (x 3 , x 4 )}. Note that additional arcs of the layers L 1 , L 2 , L 3 will be generated according to the subsequent de nitions. For simplicity reasons we assume all weighting functions to be limited to the set {0, 1} of vertex/arc weights. Since we assume no additional constituent layer we get l = 3. us, any linguistic multilayer network L(X , 3) based on this se ing is layered into three layers. roughout this paper, we use the following simplifying notation: for any graph G = (V , A, λ) of order |G | = |V |, arc set A ⊆ V 2 of size |A| and vertex labeling function λ and any vertex ∈ V , we write = λ( ). us, for any two graphs G i , G j with vertex labeling functions λ i and λ j , for which λ i ( ) = λ j (w), ∈ V i , w ∈ V j , we can write = w. Further, for any function f : X × Y → Z , for which f (x, ) = z, we use the following alternative notations: Finally, for any function f : Z n → Z we introduce the following notation based on square brackets: To leave no room for ambiguity, we assume that expressions of the sort x → f , → x are replaced from le to right into expressions of the sort x ↔ f . Henceforth, a structure such as x → f will be called information link. Based on De nition 3.1 we start now with introducing text topic networks using the following auxiliary notion: De nition 3.2. Let C = (V C , A C ) be a directed Generalized Tree (GT) according to [96,97] representing a hierarchical topic structure, henceforth called Reference Classi cation System (RCS), that is spanned by kernel arcs which are possibly superimposed by upward, downward, lateral, sequential, external or re exive arcs. 2 at is, vertices t ∈ V C represent topics, while kernel arcs (t, u) ∈ A θ represent subordination relations according to which u is a thematic specialization of t. Let further θ be a hierarchical text classi er [126] taking values in V C that has been trained, validated and tested by means of a reference corpus R. Let now L(X , l) = (L, C) be a LMN spanned over the text corpus X and layered into l layers. We call the structure S = (C, θ, L(X , l)) a De nitional Se ing for de ning topic networks.  Figure 15) and the topic classi er θ of [138], which uses the DDC as its reference classi cation system C, a de nitional se ing is exempli ed by (DDC, θ, L(X , 3)). More speci cally, by t 1 , t 2 , t 3 we will denote three topic labels of the third level of the DDC so that V C = {. . . , t 1 , t 2 , t 3 , . . .}. Note that by using the DDC as a reference classi cation, the generalized tree of De nition 3.2 is reduced to a tree (see Section 3.2 for more details).
De nition 3.3. Given a de nitional se ing S = (C, θ, L(X , l)) according to De nition 3.2, a Text Topic Network (TTN) is a vertex-and arc-weighted simple directed graph with vertex set V and arc set A ⊆ V 2 which is said to be derived from S and inferred from L 1 by means of the optional classi er θ ← and the monotonically increasing functions α, β, γ , δ : and κ an injective arc labeling function. T (L 1 ) is called a one-layer topic network that is generated by the generating layer L 1 .
Formulas 9 and 11 require that the weighting values for nodes and arcs are greater than 0: otherwise, the candidate vertices and arcs do not exist in the TTN. θ ← is a classi er mapping pairs (t, x) of topics t ∈ V C and texts x onto real numbers indicating the extent to which x is a "prototypical" instance of t. 3

1:14
In our example, we disregard θ ← . Further, we assume that the functions α, β, γ , δ are identify functions. us, µ( 1 ) = µ( 2 ) = 1 and µ( 3 ) = 2. Now, we can generate a topic link between 1 and 2 by exploring the intertextual relation (x 1 , x 2 ) ∈ A 1 : To this end, we assume that By analogy to this case, we link topic 3 by means of a re exive link so that Note that these simpli cations are made for simplicity's sake only: Section 3.2 will elaborate a realistic weighting scenario. However, the function of the la er illustration is to show that by the intertextual linkage of both texts, we get evidence about the linkage of the topics instantiated by these texts. TTNs always operate according to this premise: they network topics as a function of the networking of an underlying set of texts. Figure 3 gives a schematic depiction of this scenario, which is varied subsequently to illustrate the other types of topic networks developed in this paper.
A concrete example of a TTN that is derived from the articles of the so-called Dresden wiki (see Section 4.1) is depicted in Figure 4. It shows the highest weighted topics addressed by these articles and their (undirected) links. e TTN has been computed by means of the procedural model of Section 3.2. Evidently, the topic Transportation; ground transportation is most prominent in this wiki followed by the topic Central Europe; Germany. Most topics belong to the areas transportation (red), geography and history (turquoise) and architecture (gray) (for the color code see the appendix). More examples of TTNs can be found in Figures 7,12 and 13. Arguments of the sort x → θ can be used to quantify evidence about text x as an instance of topic : the more evidence of this sort, the higher possibly the impact of x in Formula 9, the higher possibly the nal weight of . e adverb possibly refers to what is licensed by the parameters γ , δ . Arguments of the sort x → ν 1 , where x , can be used to quantify evidence that text x is intertextually linked to text : the more evidence of this sort, the higher possibly the weight of the link from x to , the higher possibly the in uence of this link onto the weight of the link from topic to topic w in Formula 11. 4 In this and related de nitions, we do not fully specify the functions θ, θ ← , α, β, γ , δ to leave enough space for di erent instances of topic networks.
De nition 3.3 relies on the pivotal text layer for deriving topic networks. To integrate further layers into the process of inferring topic networks, we introduce the following generalized schema: De nition 3.4. Given a de nitional se ing S = (C, θ, L(X , l)) according to De nition 3.2, an which is said to be derived from S and inferred from L 1 and the elements of L by means of the optional classi ers θ ← , ϑ : x, ∈ V 1 denote two texts, a, b ∈ V 2 denote two authors working on x and , respectively, p, q ∈ V 3 denote two lexical units occurring in x and , respectively. Inferred weights of vertices are denoted by means of (red) reflexive arcs.

and monotonically increasing functions
α, β, γ , δ : and κ an injective arc labeling function. For L = {L i }, we say that T (L 1 , L ) is a two-level topic network that is generated by the generating layers L 1 and L i . If L = ∅, then Formula 13 changes to Formula 9 and Formula 14 to Formula 11. By omi ing any optional classi er ∈ {θ ← , ϑ ← }, expressions of the sort r ↔ f change to r → f . ϑ is treated analogously.
To understand Formula 13 look at Figure 5: among other things, Formula 13 collects the triangle spanned by , x and a supposed that the two-level topic network is based on text and authorship links. Obviously, De nition 3.4 generalizes De nition 3.3. Now it should be clear why we speak of the text network of an LMN as its pivotal level: it is the reference layer of any additional layer that is integrated into a two-level topic network according to De nition 3.4. is role is maintained below when we generalize this de nition to capture n layers, n > 2. With the help of De nition 3.4, we can immediately derive so-called author topic networks: De nition 3.5. An Author Topic Network (ATN) is a directed graph e relational arguments of this de nition can be motivated as follows -assuming that they are instantiated appropriately: (1) x θ → can be used to represent evidence that text x is about topic possibly in relation to other topics of V C .
(2) θ ← → x can be used to represent evidence that text x is a prototypical instance of topic possibly in relation to other texts in V 1 .
(3) r ϑ → can be used to represent the extent to which agent r tends to write about topic possibly in relation to other topics of V C .
(4) ϑ ← → r represents evidence that agent r is a prototypical author writing about topic possibly in relation to other agents in V 2 .
→ can be calculated to represent evidence about text x to be intertextually linked to text (e.g. in the sense of linking contributions of di erent authors). Otherwise, if x = , x → ν 1 can be used to quantify evidence about x being intratextually structured. → s represents evidence that agent r is a coauthor of or interacting with s. For instantiating ν 2 , the literature knows a wide range of alternatives [19,111] (which mostly concern symmetric measures of co-authorship). Note that we do not require that r s. Example 3.4. Starting from Example 3.3 to exemplify arcs between topics in author topic networks, we can now additionally explore the evidence, that text x 1 and x 2 are both co-authored by the agents a 1 , a 2 . at is, we can assume a co-authorship link (a 1 , a 2 ) ∈ A 2 (A 2 is the arc set of the author layer in De nition 3.1) of weight ν (a 1 , a 2 ) = 1. Let us now assume the following simpli cation of the function δ in De nition 3.4, for which we assume that it simply multiplies and adds up its argument values in the following way: In our example, we Since there is no other interlinked pair of texts (see Example 3.1), instantiating the topics 1 , 2 , we get ν (( 1 , 2 )) = 2 as the weight of this topic link in the corresponding ATN. By this simpli ed example of an ATN, we get the information that the link of topic 1 to topic 2 is additionally supported by the co-authorship of agents a 1 , a 2 : this information extends the evidence about the topic link as provided by the underlying TTN of Example 3.3. Likewise, the re exive link of topic 3 is augmented by 1 compared to the underlying TTN, while there is no other topic link to be considered in this example of an ATN. By analogy to Figure 3, Figure 6 gives a schematic depiction of this scenario. Note that in our example, the weight of the link between authors a 1 , a 2 (cf. r ν2 → s) is a function of their co-authorship: this is only one alternative to weight the social relatedness of both agents, actually one that can be measured by exploring (special) wikis. However, any other social relatedness might be explored to weight the interaction of agents.
By comparing a text topic network T (L 1 ) = (V l +1 , A l +1 , µ l +1 , ν l +1 , λ l +1 , κ l +1 ) with an author topic network T (L 1 , {L 2 }) = (V l +2 , A l +2 , µ l +2 , ν l +2 , λ l +2 , κ l +2 ) derived from the same LMN L(X , l), we can learn how the topics of V C are manifested in the texts of corpus X in the form of a concomitance or a disparity of intertextual and co-authorship-based networking. Consider, for example, two vertices ∈ V l +1 , w ∈ V l +2 such that = w; let further ⊥ and denote the minimum and maximum that the vertex weighting functions of both graphs can assume. en we can distinguish four extremal cases: provide information on prominent topics that tend to be addressed by many texts which are coauthored by many authors.
probably apply to the majority of the topics in V C , which are hardly or even not at all addressed by texts in V 1 = X due to the narrow thematic focus of these texts.
suggests a Zip an topic e ect, according to which a prominent topic is addressed by a small group of agents or even by a single author. (4) Finally, situations of the sort

1:18
refer to rarely manifested topics addressed by a few but highly coauthored texts. In conjunction with many cases of the sort described by Formula 17, situations of this kind indicate a Zip an coauthoring e ect, according to which many authors write only a few texts, while many texts are wri en by a few authors without encountering many (relevant) coauthors. Formulas 15-18 compare the node weighting functions of a TTN with those of a related ATN. e same can be done regarding their arc weighting functions. at is, for two arcs a = (r , s) ∈ A l +1 and b = ( , w) ∈ A l +2 , for which r = ∧ s = w, we distinguish again four cases (⊥ and now denote the minimum and maximum the arc weighting functions of both graphs can assume): (1) In the case of topic is intertextually linked more strongly to topic w and authors of its text instances tend to cooperate with those of instances of topic w likewise to a greater extent.
(2) In the case of ν l +1 (a) ≈ ν l +2 (b) ≈ ⊥ (20) topic is intertextually less strongly linked to topic w and the few authors of its textual instances tend to cooperate with authors of instances of topic w likewise to a lesser extent.
topic is intertextually more strongly connected with topic w, while authors of its text instances tend to cooperate with those of instances of topic w to a lesser extent, if at all.
topic is intertextually less strongly linked to topic w, while the numerous authors of its text instances tend to cooperate with those of instances of topic w to a much greater extent. Our central question regarding the relationship between TTNs and ATNs derived from the same LMN is whether these networks are similar or not. If they are similar, we expect that cases of the sort described by formulas 15,16,19, 20 predominate so that cases matched by Formula 15 are parallelized by those considered by Formula 19 and where cases according to Formula 16 are concurrent to those described by Formula 20. An opposite situation would be that two topic nodes in the TTN are highly weighted but weakly linked, while they are weakly weighted but strongly linked in the corresponding ATN. In this case, a few or even only a single author is responsible for the thematic focus of the TTN. Note that this scenario reminds again of a Zip an e ect regarding the relation of TTNs and ATNs. By characterizing TTNs in relation to ATNs along these and related scenarios, we want to investigate laws of the interdependence of both types of networks, which may consist, for example, in the simultaneity of dense or sparse intertextuality-based networking on the one hand and dense or sparse co-authorship-based networking on the other. We may expect, for example, that the more related two topics, the more likely the authors of their textual instances cooperate. However, not so much is known about such scenarios in the area of VGI especially with regard to Hypothesis 1. us, we address this gap -at least by introducing a novel theoretical model which may help lling it. Figure 7 exempli es two ATNs in relation to a corresponding TTN (T1) which were computed using the apparatus of Section 3.2 to instantiate the formal model of this Section. e upper right ATN (A1) is computed by globally weighting co-authorship activities based on Wikipedia (as explained in Section 3.2.3); the ATN (A2) below is calculated by weighting of these activities relative to the city wiki itself. Figure 7 shows that the topic with DDC number 720 (Architecture) is weighted  Table 4 for statistics about this wiki) using the procedural model of Section 3.2. Top right shows the ATN for which (co-)authorship activities are estimated by means of Wikipedia (see Section 3.2.3). The ATN for which these activities are estimated via the wiki itself is displayed below. The visualizations are carried out by means of PolyViz [136] regarding the 2 nd level of the DDC: nodes are labeled (with numbers denoting the respective 2 nd -level class) and colored to encode their membership to one of the top 10 DDC classes (see appendix). The higher the weight of a topic, the larger the node, and the higher the weight of an arc, the thicker the line. Node and line sizes are defined relative to the maximum vertex and arc weights of the underlying network.
is is all the more pronounced in A2, where 720 becomes the most prominent topic and consequently displaces the top subject from T1, that is, topic 380 (Commerce, communications & transportation). at is, although topic 380 is most frequently addressed in this wiki's texts, topic 720 is not only almost as salient, but also a racts many more activities among its interacting coauthors. Similar observations concern the switch of the roles of the topics 910 (Geography & travel) and 940 (History of Europe) from T1 to A1 and A2.
Regardless of the answer to this and related questions, we will also ask whether the shape of an ATN can be predicted if one knows the shape of the corresponding TTN and vice versa. To answer this question, we will consider LMNs of di erent text genres: of city wikis and regional wikis on the one hand and extracts of encyclopedic wikis on the other. We expect that LMNs spanned over corpora of the same genre exhibit a pa ern of collaboration-and intertextuality-based networking that makes TTNs and ATNs derived from them mutually recognizable or predictable, whereas for LMNs generated from corpora of di erent genres this does not apply.
For reasons of formal variety we now consider an alternative to author topic networks, namely so-called word topic networks, which in turn are derived from De nition 3.4: is de nition departs by ve new relational arguments from De nition 3.5, which -if being instantiated appropriately -can be motivated as follows: (1) a ν3.1 → x quanti es evidence about the role of word a as a lexical constituent of text x possibly in relation to all other texts in which a occurs. Typically, ν 3.1 is implemented by a global term weighting function [123] or by a neural network-based feature selection function.
→ a quanti es evidence about the role of the word a as a lexical constituent of the text x possibly in relation to other lexical constituents of x. Typically, ν 1.3 is a local term weighting function, such as normalized term frequency [123], or a topic model-based function.
(3) a ϑ → represents evidence about the word a to be associated with the topic possibly in relation to all other topics of V C .
(4) ϑ ← → a calculates evidence about the extent to which the topic is prototypically labeled by the word a, possibly in relation to all other words in V 3 .
(5) a ν3 → b quanti es evidence about the extent to which the word a associates the word b. Typically, ν 3 is computed by means of word embeddings [103]. Based on this list we be er understand what topic networks o er in contrast to TMs. is concerns the exibility with which we can include informational resources computed by di erent methods (e.g. based on neural networks, topic models, LSA, etc.) to generate topic networks (cf. challenge P5 on page 9). Di erent relational arguments X → Z Y can be quanti ed using di erent methods, which in turn can belong to a wide range of computational paradigms. Table 1 gives an account of the generality of our approach by hinting at candidate procedures for computing the di erent relations of Figure 5.
Example 3.5. Starting from Example 3.3 to exemplify arcs between topics in word topic networks, we have to additionally explore evidence regarding the lexical relatedness of the vocabularies of the texts x 1 and x 2 . In Example 3.1, we assumed that the intersection of both texts (represented as bags-of-words) is given by the set {w 1 , w 2 }. By analogy to Example 3.4, we assume now the following simpli cation of the function δ of De nition 3.4: In this scenario, we have to instantiate De nition 3.4 as follows: ← t 1 = λ( 1 ), w ← t 2 = λ( 2 ), x = x 1 , = x 2 , r = w 1 and s = w 1 for one summand and -everything else being constant -r = w 2  Figure 5 and candidate procedures for weighting the corresponding arcs (last column). and s = w 2 for a second summand (for w 3 (w 4 ) we do not assume a lexical relatedness w.r.t. the words of text w 4 (w 3 )). Note that under this regime, we assume that relatedness of lexical constituents only concerns shared usages of identical words -of course, this is a simplifying example. By analogy to the se ing of Example 3.4 we have thus to conclude that ν (( 1 , 2 )) = 4 as the weight of the topic link from 1 to 2 in the corresponding WTN. For texts x 3 , x 4 we may alternatively assume that lexical relatedness does not only concern shared lexical items but also relatedness that is measured, for example, by means of a terminological ontology [21] or by means of word embeddings [103]. In this way, we may additionally arrive at a topic link between 2 and 3 . In order to allow for a comparison of a WTN with its corresponding TTN, a more realistic weighting scheme is needed that also re ects above and below average lexical relatednesses of the lexical constituents of interlinked texts -in Section 3.2 we elaborate such a model regarding ATNs in relation to TTNs. Figure 8 gives a schematic depiction of the scenario of WTNs as elaborated so far.
It is worth emphasizing that instead of the (language-systematic) lexicon layer L 3 , we may use a constituent layer L k , k > 3, to infer a two-level topic network. For example, we can use the layer spanned by the sentences of the pivotal texts to obtain a sort of sentence topic network. In this case, a → ν k b may quantify evidence about the extent to which the sentence a entails the 1:22 Fig. 8. Schematic depiction of the informational sources of linking topics (red vertices) in word topic networks as a function of the textual relatedness of two texts (blue vertices) (that belong to layer L 1 of a corresponding LMN -see Definition 3.1) and the lexical relatedness of corresponding words (orange vertices) (that belong to layer L 3 of a corresponding LMN). Bidirectional red arcs denote arcs of the corresponding margin layers in Definition 3.1.
sentence b or the extent to which the sentence a is similar to the sentence b etc., while x → ν 1.k a may quantify evidence about the extent to which the sentence a is thematically central for the text x etc. In sentence topic networks, topic linkage is a function of sentence linkage: prominent topics emerge from being addressed by many sentences, while prominent topic links arise from the relatedness of many underlying sentences. Another example of inferring two-level topic networks is to link topics as a function of places mentioned (by means of toponyms) within the texts of the underlying corpus X where geospatial relations of these places can be explored to infer concurrent topic relations: if place p is mentioned in text x about topic and place q in text about topic w, where the platial relation R(p, q) relates p and q, this information can be used to link the topic nodes , w in the corresponding topic network. As a result, we obtain networks manifesting the networking of topics as a function of parallelized geographical relations.
Obviously, any other relationship (e.g., entailment among sentences, sentiment polarities shared by linked texts, co-reference relations etc.) can be investigated to induce such two-level networks. And even more, we can think of n-level networks in which several such relationships are explored at once to generate topic links. We can ask, for example, which locations are linked by which geospatial relations while being addressed in which sentences about which topics where these sentences are related by which sentiment relations. Another example is to ask which authors prefer to write about which topics while tending to use which vocabulary: the higher the number of authors who use the same words more o en to write about the same topic, and the higher the number of such words, the higher the weight of that topic. In this case, topic weighting is a function of frequently observed pairs of linguistic (here: lexical) means and authors. On the other hand, the higher the degree of co-authorship of two authors contributing to di erent topics and the higher the degree of association of the words used by these authors to write about these topics, the higher the weight of the link between the topics. is concept of a topic network induced by the text, the co-authorship and the lexicon layer of an LMN is addressed by the following generalization, which provides a generation scheme for topic networks: De nition 3.7. Given a de nitional se ing S = (C, θ, L(X , l)) according to De nition 3.2, an (L 1 , L )-Topic Network, for which is a vertex-and arc-weighted simple directed graph which is said to be derived from S and inferred from L 1 and the elements of L by means of the optional classi ers θ ← , ∀i j ∈ {i 1 , . . . , i n } : increasing functions α, β, γ , δ : µ : V → R + is a vertex weighting function, ν : A → R + an arc weighting function, λ : V → V C an injective vertex labeling function, V C (V ) = {λ( ) | ∈ V } ⊆ V C , and κ an injective arc labeling function. For |L | = n, we say that T (L 1 , L ) is an m-level, m = n + 1, topic network generated by the generating layers L 1 and the elements of L . If L = ∅, Formula 25 changes to Formula 9 and Formula 27 to Formula 11. By omi ing the optional classi er ∈ {ϑ ← i j | j ∈ {1, . . . , n}}, expressions of the sort r ↔ f change to r → f . θ and ϑ i j are treated analogously. In order to derive an undirected m-level topic network T (L 1 , L ) = (V, E, µ, ν, λ, κ) from T (L 1 , L ), we de ne: and where ζ 1 , ζ 2 are monotonically increasing functions.
Evidently, De nition 3.7 is a generalization of De nition 3.3 by considering higher numbers of generating layers. A schematic depiction of the scenario addressed by this de nition is shown in Figure 10 by example of a 3-level topic network that explores evidence about topic linking starting from the text, the author and the lexicon layer of De nition 3.1. Likewise, Figure 11 depicts an n-level topic network, n > 3, in which additional resources are explored beyond the word, author and text level. Figure 5 illustrates more formally the inference process underlying De nition 3.7, and in particular of the arguments used. It illustrates the inference of an arc that connects two topics by exploring the links of the text, author, and lexicon layers of an underlying LMN. In this example, the blue and black arcs are evaluated to determine the weights of red arcs connecting the focal topic nodes. Blue arcs are used to orientate inferred arcs. We will not develop this apparatus further, nor will we empirically examine n + 1-layer topic networks for n > 2. Rather, the apparatus developed so far serves to demonstrate the generality, exibility and extensibility of our formal model.
Above we explained that one of the reasons for introducing a exible and extensible formalism of topic networks is to compare topic networks derived from di erent layers (e.g. from the text Fig. 10. Schematic depiction of informational sources explored to link topics (red vertices) in a 3-level topic network as a function of the textual relatedness of texts (blue vertices) (belonging to layer L 1 of Definition 3.1), the social relatedness of corresponding authors (green vertices) (belonging to layer L 2 of Definition 3.1) and the lexical relatedness of corresponding words (orange vertices) (belonging to layer L 3 of Definition 3.1). In this scenario, thematic relatedness is the information to be inferred, while textual, lexical and social relations concern given information or evidence. Bidirectional red arcs denote arcs of corresponding margin layers of Definition 3.1. Fig. 11. Schematic depiction of informational sources explored to link topics (red vertices) in an n-level topic network, n > 3, as a function of the textual relatedness of texts (blue vertices) (belonging to layer L 1 of Definition 3.1), the social relatedness of corresponding authors (green vertices) (belonging to layer L 2 of Definition 3.1), the lexical relatedness of corresponding words (orange vertices) (belonging to layer L 3 of Definition 3.1) and additional layers of contextual pa erns concerning, for example, the underlying medium, genre or register instantiated by the texts under consideration.
layer on the one hand and the author layer on the other). In order to systematize this approach, we nally introduce the concept of a multiplex topic network, which is derived from the same or from di erent linguistic multi-layer networks:  Table 5 for the corresponding corpus statistics). Obviously, the most prominent 2 nd -level DDC class in both TTNs is 510 (Mathematics).  Table 5 for the corresponding corpus statistics). Obviously, the most prominent 2 nd -level DDC class in both TTNs is 620 (Engineering). Compared to the example in Figure 12, the 2 nd orbit is now thematically much more diversified.   (5) and network similarity analysis (6) both based on Section 3.2.6, machine learning of network classifiers (7) and classification analysis (8) both based on Section 3.2.7 and, finally, time series analysis of topic networks (which will not be performed here) (9).
such that each M i , i ∈ {1, . . . , k}, is an (L 1 , L i )-Topic Network derived from S according to De nition 3.7 and for each i, j ∈ {1, . . . , l }, i j, D i .j ∈ D, |D| = k(k − 1), is called a margin layer ful lling the following requirements: See Figure 9 for a schematic depiction of the comparison of two MTNs. Note that because of De nition 3.7, it does not necessarily hold that V C (V i ) = V C (V j ), but it always holds that In this respect, we depart from [16], who instead require more strongly that V i = V j . In the case of topic networks, this would be too restrictive, as di erent topic networks derived from the same de nitional se ing can focus on di erent subsets of topics, while ignoring the rest of the topics in the codomain V C of θ . 5 In this paper, we quantify similarities of the di erent layers of MTNs to shed light on Hypothesis 1. More speci cally: we generate an LMN for each corpus of a set of di erent text corpora in order to derive a separate two-layer MTN for each of these LMNs, each consisting of a TTN and an associated ATN. en, among other things, we conduct a triadic classi cation experiment: rstly with respect to the subset of all TTNs derived from our corpus, secondly with respect to the subset of all corresponding ATNs and thirdly with respect to the subset of all TTNs in relation to the subset of the corresponding ATNs (see Figure 16). In the next section, we explain the measurement procedure for carrying out this triadic classi cation experiment.

A Procedural Model of Topic Network Analysis
In order to instantiate topic networks as manifestations of the rhematic networking of places, we employ the procedure depicted in Figure 14. It combines nine modules for the induction, comparison and classi cation of topic networks.
3.2.1 Module 1: Natural Language Processing. Preparatory for all modules is the natural language processing of the input text corpora. To this end, we utilize the NLP tool chain of TextImager [63] to carry out tokenization, sentence spli ing, part of speech tagging, lemmatization, morphological tagging, named entity recognition, dependency parsing [17] and automatic disambiguation -the la er by means of fastSense [137]. For more details on these submodules see [36,137]. As a result of Module 1, the topic classi cation can be fed with texts whose lexical components are disambiguated at the sense level. As a sense model, we use the disambiguation pages of Wikipedia, currently the largest available model of lexical ambiguity.
3.2.2 Module 2: Topic Classification. According to De nition 3.2, the derivation of TNs from LMNs requires the speci cation of a Reference Classi cation System (RCS) C = (V C , A C ). For this purpose, we utilize the Dewey Decimal Classi cation (DDC), a system that is well-established in the area of (digital) libraries. As a result, the generalized tree C from De nition 3.2 degenerates into an ordinary tree since the DDC has no arcs superimposing its kernel hierarchy (see Figure  15 for a subtree of the DDC). As a classi er θ , which addresses the DDC, we use θ text2ddc [138], a topic classi er based on neural networks, which has been trained for a variety of languages [10]. 6 Starting from the output of Module 1 (NLP), we use text2ddc to map each input text x to the distribution of the 5 top-ranked DDC classes that best match the content of x as predicted by text2ddc. Since text2ddc re ects the three-level topic hierarchy of the DDC, this classi er can output a subset of 98 classes of the 2 nd (two classes of this level are unspeci ed) and a subset of 641 classes of the 3rd DDC level for each input text. 7 us, each topic network of each input corpus is represented on two levels of increasing thematic resolution. Note that text2ddc classi es input texts of any size (from single words to entire texts in order to meet challenge P3, page 8) and works as a multi-label classi er for processing thematically ambiguous input texts. By using an RCS, text2ddc meets challenge P2 simply by referring to the labels of the topic classes of the DDC. Further, since text2ddc is trained with the help of a reference corpus, it can detect topics, even if they occur only once in a text (this is needed to meet challenge P4) and guarantees comparability for di erent input corpora (challenge P1). text2ddc is based on fastText whose time complexity is O(h log 2 (k)), where "k is the number of classes and h the dimension of the text representation" [2 72] (making this classi er competitive compared to TMs). Figures 4,7,12 and 13 show examples of TTNs and ATNs generated by means of text2ddc by addressing the second level of the DDC. Each of these topic networks was generated for a subset of articles of the German Wikipedia that are at most 2 clicks away from the respective start article x (for the statistics of the corpora underlying these topic networks see Section 4.1). Formally speaking, let G = (V , A) be a directed graph and ∈ V ; the nth orbit induced by is the subgraph that is induced by the subset of vertices whose geodetic distance δ ( , w) from is at most n (cf. [32]). We compute the rst and the second orbit of a set of Wikipedia articles (so that G denotes Wikipedia's web graph). is is done to obtain a basis for comparison for the evaluation of topic networks derived from special wikis. Since Wikipedia is probably more strongly regulated than these special wikis, we expect higher disparities between networks of di erent groups (Wikipedia vs. special wiki) and smaller di erences for networks of the same group.

Module 3: Network Induction.
Network induction is done according to the formal model of the Section 3.1. It starts with inducing an LMN L(X , 2) for each input corpus X . at is, for each corpus X we generate a text network L 1 and an agent network L 2 according to De nition 3.1: (1) In this paper, X always denotes the set of texts (web documents) of a corresponding wiki W so that the text layer L 1 = (V 1 , A 1 , µ 1 , ν 1 , λ 1 , κ 1 ) of the LMN L(X , 2), in which L 2 is an agent network de ned below, can be used to represent the web graph [7] of this wiki. us, for any two texts x, that are linked in W , we generate an arc a = ( , w) ∈ A 1 where ν 1 (a) = 1 and κ 1 (a) = hyperlink. Further, for ∀x ∈ V 1 : e author layer L 2 = (V 2 , A 2 , µ 2 , ν 2 , λ 2 , κ 2 ) of the LMN L(X , 2) corresponding to L 1 (see De nition 3.1) is generated as follows: V 2 is the set of all registered authors or TCP/IP addresses of anonymous users working on texts in X so that ∀ ∈ V 2 : λ 2 ( ) maps to this name or IP address, respectively. Let (r , x) be the sum of all additions made by author r ∈ V 2 to any revision of the edit history of text x; we use (r , x) to approximate the more di cult to measure concept of authorship as introduced by Brandes et al. [19]. en we de ne: ∀r ∈ V 2 : µ 2 (r ) = x ∈V 1 (r , x). Further, A 2 is the set of all arcs (r , s) between users r , s ∈ V 2 , for which there is at least one text x to which both contribute so that (r , x), (s, x) > 0. en, we de ne [cf. 99]: Finally, κ 2 (a) = coauthorship. Obviously, L 2 is symmetric. Now, given the de nitional se ing (C, θ, L(X , 2)), where C, θ are instantiated in terms of Section 3.2.2, we induce a TTN T (L 1 ) = (V L 1 , A L 1 , µ L 1 , ν L 1 , λ L 1 , κ L 1 ) according to De nition 3.3 by means of appropriately de ned monotonically increasing functions α 1 , β 1 , γ 1 , δ 1 . To this end, we utilize the set
of the membership values of text x ∈ V 1 to the topics in V C , where the parameter θ min denotes a lower bound of an acceptable degree of aboutness. We set θ min 0. Further, bȳ we denote the mean value of the set Y = ∪ x ∈V 1 θ V C x of selected topic membership values and by max(X, m) we denote the m ∈ {1, . . . , |X|} largest value of the arbitrary set X. Finally, we select a number 0 < m ⊥ < |V C | and de ne ∀ ∈ V , ∀x ∈ V 1 thereby instantiating the parameters α, β, γ , δ of the Formulas 8-11 of De nition 3.3: According to Formula 37, In this paper, we experiment with m ⊥ = 5. e higher the value of m ⊥ , the more sensitive the generation of T (L 1 ) to the thematic ambiguity of the underlying texts. However, since θ creates a membership value for each pair of texts and topics, we useθ as a lower bound of aboutness (in the sense of addressing a topic known by θ ) so that irrelevant classi cations θ x ( ) do not a ect µ L 1 ( ).
Regarding the ATN T (L 1 , corresponding to the TTN T (L 1 ), we have to de ne monotonically increasing functions α 2 , β 2 , γ 2 , δ 2 . To this end, we use several auxiliary functions: • By (·, ·) we denote the mean activity per author per Wikipedia article.
• By | (·, ·)| we denote the average number of active authors per Wikipedia article.
e corresponding estimators are found in Table 2. Now, consider the set V 2 (x) of all active authors of text x and the set θ (V 1 ) of all texts that potentially contribute to µ L 2 ( ) and thus to the weight of the vertex ∈ V L 2 : en we de ne the following functions and ratios: scale is a function which is used to rescale below or above average values (see Formula 43). Formula 44 de nes the mean of the rescaled numbers of active users per article in θ (V 1 ). Based on these preliminaries and regarding the vertex weighting function µ L 2 , we de ne ∀ ∈ V , ∀r ∈ V 2 thereby instantiating the functions α and β of Formula 13 of De nition 3.4: In the present paper, we experiment with p = 2. To understand this de nition, we have to run through the cases of Formula 46: (1) e case (r , x) = (·, ·): Suppose that for each x ∈ θ (V 1 ) the following condition holds: ∀r , s ∈ V 2 (x) : (r , x) = (s, x) = (·, ·). In this case, we obtain for each x ∈ θ (V 1 ) the following result: In other words: If all authors of all texts contributing to the weight of a topic contribute to these texts according to the average activity, the weight of this topic in the ATN corresponds to that of the corresponding TTN. In this case, the average activity does not bias the weight of a topic in the ATN compared to the same topic in the corresponding TTN. Obviously, this scenario gives us a neutral point or, more speci cally, a calibration point for the comparison of ATNs and TTNs. Such a calibration point allows us to interpret any down-or upward deviation of the topic weights in both networks, since no deviation means average activity and average number of active users. However, this consideration presupposes that ω = 1 so that α 2 = α 1 = id. If ω > 1, then the number of authors of texts contributing to the weight of is on average higher than expected on the basis of Wikipedia, so that the weight of the topic in the ATN is "biased upwards" compared to the weight of the same topic in the corresponding TTN. Conversely, if ω < 1, then the number of authors of texts contributing to the weight of is on average smaller than expected, so that 's weight in the ATN is "biased downwards" compared to the weight of the same topic in the corresponding TTN. is scenario teaches us the di erent roles of α 2 and β 2 with respect to the weighting of the β 1 values: while β 2 operates as a function of the activities of authors, α 2 considers their number. (2) e case (r , x) < (·, ·): suppose for each s r that (s, x) = (·, ·) while (r , x) < (·, ·). en, we conclude: us, for p > 1 we penalize the contribution of a below-average active author of a text to the weight of the topic to which this text contributes. e di erent e ects of ω 1 have already been discussed. (3) e case (r , x) > (·, ·): if we suppose now that ∀s r : (s, x) = (·, ·) while (r , x) > (·, ·), we conclude that for p > 1, we reward the contribution of an aboveaverage active author of a text to the weight of the topic to which this text contributes.
In a nutshell: α 2 and β 2 implement the following proportionality assumptions: • By α 2 we penalize or reward under-or above-average co-authorships: the higher the aboveaverage number of authors contributing to the texts of a topic, the higher the reward e ect, the higher the weight of the topic. And vice versa: the lower the below-average number of authors contributing to the texts of a topic, the higher the penalty e ect, the lower the weight of the topic. • By β 2 we penalize or reward under-or above-average activities of single authors: the higher the above-average activity of a single author contributing to a text of a topic, the higher the reward e ect, the higher the contribution of this author-text pair to the weight of the topic. And vice versa: the lower the below-average activity of a single author contributing to a text of a topic, the higher the penalty e ect, the lower the contribution of this author-text pair to the weight of the topic.
Finally, we de ne the functions γ 2 and δ 2 to get instantiations of the functions γ and δ of Formula 14 of De nition 3.4 (or, in the generalized case, of Formula 27 of De nition 24). is is done by means of the following auxiliary function: where ν 2 estimates the average degree of co-authorship in Wikipedia according to Formula 33. 8 ν 2 (r , s) is a readjustment of ν 2 (r , s) in relation to the mean value ν 2 : the higher the above-average co-authorship, the higher the value ofν 2 and the lower the below-average co-authorship, the lower the value ofν 2 . en, we de ne: In this de nition, β 2 (θ x ( )) quanti es the link x θ → and the link r Formula 14), the product β 2 (θ x ( ))β 2 (θ ( w)) quanti es the link x → ν 1 andν 2 (r , s) quanti es the link r → ν 2 s. e calibration point of arc weighting is now reached under the conditions of the following scenario (for the rst two conditions see above): Under these conditions, the authors r and s contribute to text x and at an average level while interacting at an average level of co-authorship. In this case, the (co-)authorship of both authors does not in uence the strength of the corresponding arc in the ATN: neither in terms of reducing nor of increasing ν 2 ( , w). Note that the size of an ATN (i.e., the number of its arcs) is always less than or equal to that of the corresponding TTN, since the arcs present in a TTN are merely re-weighted in the corresponding ATN: no new arcs are added. e same holds for the order of the ATN since there is no node in a TTN for which there is no author authoring it. Our instantiation of multiplex text and author topic networks has shown two points: rstly, we demonstrated a single parameter se ing as an element of a huge parameter space spanned by parameters such as p, ν 2 , (·, ·), | (·, ·)|, θ , α 1 , α 2 , β 1 , β 2 , γ 1 , γ 2 , δ 1 , δ 2 etc. 9 Secondly, anyone who complains about the apparently inherent parameter explosion in our approach should consider the hyperparameter spaces of neuronal networks as an object of parameter optimizations. Regardless of the heuristic character of our approach, compared to the black box character of neural networks, its se ings are extensible on the basis of the schematic framework provided by De nition 3.8 of MTNs and the de nitions it is based upon. At the same time, this approach guarantees interpretability as long as the di erent ingredients entering our model via formulas of the sort as Formula 25 and Formula 27 ful ll this condition -in order to meet challenge P5.

Module 4: Network Randomization.
Randomization is conducted to assess the signi cance of our ndings. is is necessary because there is currently no related classi cation in the area examined here that can serve this role. To ll this gap, we compute the following randomizations: (1) Baseline B1: A lower bound of a baseline is obtained by randomly assigning the object networks onto the gold standard (target) classes. is can be done by informing the assignment about the true cardinality of these classes (B11) or not (B12). We opt for B11 since this variant yields a higher F-score, making it more di cult to surpass. Of course, any 1:34 serious network representation and classi cation model should go beyond this baseline. B1 will be averaged over 100,000 iterations.
(2) Baseline B2: An alternative is to randomize the input networks and to derive vector representations (according to Section 3.2.3), which ultimately undergo the same classi cation process as the original networks. at is, the input networks are randomly rewired to generate Erdős-Rényi (ER) graphs, for which we ask whether they are separable by the same classi cation model. 10 If this is successful (in terms of high F -scores 11 ), then we conclude that the network representation model or the operative classi er is not informative enough regarding the hypothetical class memberships of the input networks. Conversely, the lower the average F -scores obtained by classifying the randomized networks compared to the classi cation of the original ones, the more informative the representation model or the classi cation procedure regarding the underlying hypotheses. By keeping the model constant while varying the classi er we can ultimately a ribute this (non-)informativity to the underlying representation model. Conversely, by keeping the classi er constant while varying the model we can a ribute this informativity to the classi cation model. B2 will be repeated 100 times.
(3) Baseline B3: A third baseline results from randomizing the matrices that form the input of the target classi ers. is means that instead of calculating graph invariants or similarity values to feed the classi ers, we use matrices whose dimensions are chosen uniformly at random from the domain of the corresponding invariants or (dis-)similarity measures. 12 If the classi cation based on the original networks does not exceed this baseline, we are again informed about a de cit of our representation model. Evidently, we are looking for models that signi cantly exceed this baseline; otherwise we would have to accept that the same classi ers perform be er on random values than on our feature model. B3 will be repeated 100 times. (4) Baseline B4: Finally, we start from randomly reorganizing the set of observations into random classes while using the same representation model to separate the resulting random gold standard. 13 We choose the variant of using randomized cardinalities of the random classes rather than keeping the sizes of the gold standard. Tests have shown that this approach tends to generate higher F -scores than the la er. If our network representation and classi cation model does not outperform this baseline, we learn that the underlying invariants used to characterize the networks are not speci c enough: rather, they can be related to random classi cations of the same objects using the same feature space. Obviously we are looking for a model characterizing the gold standard (tendency to speci city) and not a random counterpart of it (tendency to non-speci city). B4 is averaged over 100 repetitions.
B1 is a lower bound: models that fall under this bound are obsolete. B2 concerns the evaluation of the network representation or classi cation model. B3 focuses on evaluating the classi cation model, and B4 aims to evaluate the speci city of the operative feature model.

Module 5: Network antification.
Module 5 is a preparatory step for a subset of network similarity measures. is relates to so-called topology-based approaches to graph similarity [1,83,84,93,95]. e idea behind this approach is to map input networks onto vectors of graph indices or invariants to compare them with each other. at is, graph similarity is traced back to similarity 10 An alternative, not considered here, would be to randomize the topic classi cation of the underlying texts. 11 e F -score is a measure of the accuracy of a classi cation, that is, the harmonic mean of its precision and recall. 12 We require that the main diagonal of the random matrix is 1 and that it is symmetric. 13 Obviously, we have to prevent that the gold standard is ever part of the set of these randomizations. in vector space: the higher the number of indices for which two graphs resemble each other, the more similar the graphs. e apparatus that we employ in this context is described next.
3.2.6 Module 6: Graph Similarity Analysis. Our hypothesis about thematic networks on geographical places says that these networks are similar in terms of the skewness of their thematic focus and their network structure, regardless of whether the underlying texts are wri en by di erent communities and regardless of the framing theme. To test this hypothesis, we apply the framework of graph similarity measurement which allows for mapping the second of these three reference points by exploring the structure of topic networks as well as features of their nodes. Since graph similarity measurement is generally known to be computational complex, we take pro t from the fact of dealing with labeled graphs. By using alignments of the labels of the nodes of the graphs to be compared, we reduce the time complexity of these approaches enormously. e literature knows a number of approaches for graph similarity measurement. Among other things, this includes the following approaches (see Emmert-Streib et al. [37] for an overview [cf. 77,78]; the paper does not aim at a comprehensive study of them, but focuses on a selected subset): (1) Graph Edit Distance (GED) based approaches [22,69,140] and their relatives (e.g. the Vertex and Edge Overlap (VEO) [114]), (2) spherical [32] or neighborhood-related approaches [cf. 78] and (3) network topology-related approaches [1,83,84,93,95,114].
We will develop and test candidates of each of these classes.
GED-based methods are well studied in the area of web mining [125]. Since we are dealing with labeled graphs, we can compute the GED directly from the vertex and edge sets of the input graphs [22,78]. Let G 1 = (V 1 , A 1 , µ 1 , ν 1 , λ 1 , κ 1 ), G 2 = (V 2 , A 2 , µ 2 , ν 2 , λ 2 , κ 2 ) be two TNs, then their GED is computed as follows: Since we are targeting graph similarities, we consider GES instead of GED, where overlaps of vertex and arc sets are equally weighted: e same is done in the case of Wallis' approach to graph distance [140] which is adapted as follows to get a similarity measure: A relative of GES is the Vertex/Edge Overlap (VEO) graph similarity measure [114]: Since node and arc weights are not taken into account by these measures, we compute the following variant of GES to close this gap: wges is sensitive to arc [78] and to vertex weights of TNs, the la er measuring the membership degree of the underlying texts to the topic represented by the corresponding vertex. We say that such measures are dual weight-dependent. ese measures are of high interest since they cover more information of the underlying networks than single weight-or even weight-independent measures (cf. the axiom of edge weight sensitivity of Koutra et al. [78]). GED and its relatives share a view of similarity, according to which graphs are considered to be more similar the more (equally weighted) vertices and arcs they share. is notion of similarity is contrasted by spherical approaches (see above) as exempli ed by DeltaCon [78]. Roughly speaking, according to DeltaCon, the more similar two graphs resemble each other from the perspective of their vertices, the more similar they are. Since DeltaCon is not dual weight-dependent, we consider a dual weight-dependent relative of it. To this end, we compute the cosine of the vectors of geodetic distances for each pair of equally labeled vertices. Since topic networks can di er in their order, we rst have to align their node sets to make them comparable -this is also needed because we aim for a dual weight-dependent measurement. e required alignment is addressed by means of the following auxiliary graphs G 12 and G 21 : G 12 and G 21 are needed to make G 1 and G 2 comparable whose symmetric di erence V 1 V 2 can be non-empty while their vertex labeling functions share the same codomain (since G 1 and G 2 belong to the same multiplex topic network according to De nition 3.8). Obviously, |G 12 | = |G 21 | so that for each ∈ V i , w ∈ V i j \ V i ; i, j ∈ {1, 2}, i j, there is no path from to w in G i j . Cases in which no such path exists are denoted by w; otherwise, if such a path exists, we denote by ged i j ( , w) the length of the shortest path, that is, the geodetic distance between and w in G i j . As we deal with graph similarities, we rst transform the distance values into similarity values: gep is short for geodetic proximity. With the denominator |V i j | we penalize situations in which there is no path between and w, that is, w. e parameter ω ∈ {w, ¬w} speci es, whether the geodetic distance ged [ω,ι] i j and the geodetic proximity gep [ω,ι] i j are computed for the weighted (w) or unweighted (¬w) variant of G i j . If ω = w, we assume that each arc weighting value is normalized by means of the non-zero maximum value assumed by the arc weighting function for this network. 14 ι ∈ R + 0 speci es the maximum geodetic distance to be considered: beyond this value, nodes w are considered to be of maximum geodetic distance |V i j | to -irrespective of their real distance. For ι ≥ |V i j |, we have to compute all geodetic distances. For values of ι |V i j | (e.g. ι = 2), we arrive at variants of gep i j that are less time complex. We consider the variant ι = ∞ s that we take all path-related information into account. Now, we calculate the dual weight-dependent cosine of G 1 and G 2 as follows: cos[ω, ι, ϕ, L](G 1 , G 2 ) is the weighted cosine of the vectors of geodetic proximities of the samenamed vertices in G 12 and G 21 . In this article, we consider two instantiations of parameter ϕ: (76) ϕ 1 implements an arithmetic mean. ϕ 2 is a function of the degree centrality [40] of its arguments: the more linked a topic in a network, the higher its impact onto the similarity of the input networks. e similarity view behind this approach is that while cos X [ω, ι, ϕ 1 , L], X ∈ {A, AV}, treats allperipheral or central -nodes equally, cos X [ω, ι, ϕ 2 , L] gives central nodes more in uence. Take the example of two city networks [13]: it is plausible to say that if city networks look similar from the point of view of their central places, this should have more impact on the general similarity assessment than similarities from the point of view of peripheral locations. An extension would be to use more informative node weighting measures (e.g. closeness centrality). Finally, parameter L limits the number of vertices for which cosine values are computed. In the unlimited case, L L 12 = {l 12 ( ) | ∈ V 12 }. It is easy to see that Formulas 72, 73 and 74 are similarity measures. 14 is means that a graph G 2 , which is obtained from a graph G 1 by multiplying the weights of all arcs of G 1 by a factor c > 0, will be equal to G 1 in terms of the graph similarity measure to be introduced now (insensitivity to certain scalings).   So far we looked at measures that mostly processed the arc set A of TNs. is is contrasted by measures operating on topological indices of graphs. An example is NetSimile [11] which is based on the idea of characterizing networks by vectors of graph indices, which mostly draw on theories of social networks or egonets. Starting from seven local, node-related structural features (e.g. node degree, node clustering, or size of a node's egonet 15 ), it computes the mean and the rst four moments of the corresponding distributions to generate 35 dimensional feature vectors per network where the Canberra Distance is used to compute their distances: let ì

1:38
x, ì ∈ R k be two vectors, then their Canberra Distance is de ned as 15 See Berlingerio et al. [11] for the details of this approach.
Soundarajan et al. [127] show that NetSimile is consistently close to the consensus among all measures studied by them, showing that it approximates the results of more complex competitors. is nding makes NetSimile a rst choice in any comparative study of graph similarities. Following on from this success, we introduce a topology-related approach to graph similarity, which draws on the hierarchical classi cation of the texts underlying the topic networks by reference to the Dewey Decimal Classi cation (DDC) (see Section 3.2.2). Starting from a pretest which essentially showed that graph invariants of complex network theory [109] do not su ciently distinguish networks from their random counterparts, we decided to calculate a series of graph indices that evaluate the assignment of topics to the second level of the DDC. More speci cally, we compute three node type-sensitive variants of the four cluster coe cient C ws [141], C br [18], C bbpv [9] and C zh [146] [cf . 73]. is variation can be exempli ed by means of C ws : to derive the desired variants from C ws , we use the following scheme, where mode ∈ {intra, inter, heter} serves as a parameter to distinguish these alternatives (d i is the degree of i ∈ V ): adj intra ( i ) is the number of adjacent neighbors of i ∈ V sharing their 2 nd -level topic classi cation with i , adj inter ( i ) is the number of adjacent neighbors of i whose identical classi cation di ers from that of i and adj heter ( i ) is the number of adjacent neighbors of i whose classi cation di ers among each other and from that of i . 16 In this way, we compute for each of the cluster values C ws (unweighted), C br (unweighted), C bbpv (weighted), C zh (weighted) three variants considering intra-and interrelational as well as heterogeneous type-sensitive clustering so that topic networks are nally represented by 12-dimensional feature vectors which are compared using the cosine measure. We call this approach ToSi (as short for topological similarity). As a result of this candidate show of graph similarity measures we consider the set of measures displayed in Table 3 for measuring the similarities of topic networks in order to shed light on Hypothesis 1, part (2).

Module 7 and 8: Machine Learning and Classification Analysis.
We conduct experiments in supervised learning with the aim of training classi ers to detect the layer (TTN or ATN) to which a topic network of a MTN belongs and the genre of the corpus from which the underlying LMN is derived. at is, our machine learning starts from a set of n genres G i , i = 1..n, each of which is represented by a set C i = {C i j | j = 1..n i } of text corpora C i j (see Figure 16). e set {C i | i = 1..n} de nes a gold standard for which we assume that ∀i, j = 1..n, i j : C i ∩ C j = ∅. Next, for each corpus C i j of each genre G i , we span an LMN L(C i j , 2) that in turn is used to derive a two-layer MTN M(C i j , 2) = (M i j , D i j ) ← C i j such that M i j = {M i j , N i j } consists of exactly two topic networks: a TTN M i j and an ATN N i j both derived from L(C i j , 2). In this way, we obtain the set M n and the set M atn of all TTNs and ATNs, respectively, both derived from L(C i j , 2) according to Section 3.2.3. Next, each of the sets M n and M atn is randomized according to the procedure described in Section 3.2.4 (Baseline B2). In this way, we obtain the sets M n and M atn as the randomized counterparts of M n and M atn . As a result, we distinguish a range of classi cation experiments ( 1 -14 ) only a subset of which will be conducted in Section 4 to tackle Hypothesis 1. We start with distinguishing TTNs from ATNs. e underlying classi cation hypothesis is: Hypothesis 2. Topic networks of the same layer (also called mode) (i.e. TTN or ATN) are more similar than networks of di erent modes. 17 16 A 4th case is that i shares with a single neighbors its 2 nd -level topic while di ering from the topics of all other neighbors. 17 is concerns Scenario 1 (observed data) and Scenario 6 (randomized data) in Figure 16. Fig. 16. From sets of corpora of di erent genres to multiplex topic networks and their randomizations: corpora of di erent genres are the starting point for spanning LMNs which are then used derive two-layer multiplex topic networks (|=). In a second step, randomized counterparts according to section 3.2.4 are derived from these MTNs to obtain a further basis for evaluating their significance. In this way we arrive at fourteen candidate scenarios for classifying topic networks. e similarity of TNs will be quanti ed by means of the apparatus of Section 3.2.6. Regardless of which genre (urban vs. regional vs. encyclopedic communication) the underlying corpus belongs to, Hypothesis 2 assumes that one can always distinguish TTNs from ATNs by their structure, while TTNs and ATNs are less distinguishable among themselves. is scenario is depicted in Figure 16 by Arrow 1 . If we falsify the alternative to this hypothesis, we can assume that (poor, rich or moderate) thematic intertextuality, as manifested by TTNs, is di erent form co-authorship-based networking of topics in ATNs. Collaboration-and intertextuality-based networking would then di er in a way that characterizes their layer. In order to test genre sensitivity as disregarded by Hypothesis 2, we carry out two experiments: one in which we classify TTNs (ATNs) by genre and one in which we combine both classi cations by simultaneously classifying by genre and layer. When classifying by genre, we distinguish TNs derived from city wikis (urban communication), regional wikis (regional communication) and from subnetworks of Wikipedia (knowledge communication) (see Section 3.2.2). Finally, we generate two control classes of wikis and Wikipedia-based networks outside of these three genres. e corresponding wikis are sampled in a way that their members are rather dissimilar. Our similarity measurement should therefore not work with them. In a nutshell, the underlying classi cation hypothesis is: Hypothesis 3. Topic networks of the same genre are more similar than those of di erent genres. 18 As we consider the genre-sensitive classi cation in the context of the layer-sensitive one, we get di erent classi cation scenarios:

1:40
(1) Scenario 2 in Figure 16 denotes the task of training a classi er that detects TTNs of the same genre while distinguishing TTNs of di erent ones. If this is successful, we can assume that the TTNs analyzed here are genre-sensitive or that the communication functions that we hypothetically associate with these genres in uence the structure of these TTNs. (2) Scenario 3 from Figure 16 regards the analog experiment for the genre-sensitive classi cation of ATNs. (3) Scenario 4 concerns the alternative in which the modal di erence of TTNs and ATNs is ignored in order to classify topic networks independently of their modal di erence according to their underlying genre. (4) is scenario is contrasted with Scenario 5 , which considers classi ers for simultaneously detecting the genre and layer of TNs. e underlying classi cation hypothesis is: Hypothesis 4. Topic networks of the same layer and genre are more similar than networks of di erent layers or genres. 19 Falsifying the alternative to part (2) of Hypothesis 1 implies that TNs derived from corpora wri en by di erent communities by addressing di erent thematic frames (e.g. cities) appear nevertheless similar in their gestalt. Such a nding is very unlikely in cases in which the underlying corpora serve very di erent communication functions: Hypothesis 1 is not saying that everything is similar irrespective of the heterogeneity of the underlying function or the thematic orientation.
us, a genre-oriented classi cation that shows that TNs of the same genre (serving a certain communication function and having a certain thematic orientation), are more similar than those belonging to di erent genres, would rather correspond to such a nding. From this point of view, Hypothesis 3 and 4 are of interest: to deal with them experimentally could pave the way for testing the second part (2) of Hypothesis 1.
As explained in Section 3.2.4, we randomize input networks so that we obtain ve additional classi cation scenarios labeled 6 -10 in Figure 16.
e experiments corresponding to these scenarios will be conducted here, as far as they concern the baseline scenario B2 of Section 3.2.4. Furthermore, scenarios are to be enumerated which a empt to distinguish observed networks directly from their randomized counterparts. In this context, Scenario 11 aims at distinguishing TTNs from their randomized counterparts by means of the classi ers trained to detect TTNs. Analogously, Scenario 12 considers ATNs in relation to their randomized counterparts, while Scenario 13 aims to separate observed topic networks (whether ATNs or TTNs) from randomized ones. Finally, Scenario 14 extends the la er scenario by trying to additionally account for the modal di erence of ATNs and TTNs. ese scenarios are only listed for theoretical reasons.

EXPERIMENTATION
To test Hypothesis 1 and its relatives (i.e. Hypothesis 2, 3 and 4), we conduct several experiments using two resources: a corpus of special wikis, called the Frankfurt Regional Wiki Corpus, and a corpus of subnetworks of Wikipedia that mostly contain information about cities and regions.

Tools and Resources
e Frankfurt Regional Wiki Corpus (FRWC) contains 43 wikis collected from online wiki lists. 20 Table 4 shows the statistics of this corpus, which is divided into three genres: C relates to wikis describing certain cities, R includes wikis focusing on a speci c region, while the residual class O collects wikis that are not o -topic w.r.t. regional communication, but are unusual in their structure or the described rhemes. We consider only articles that are not redirects. Wiki authors use redirect pages to lead readers of articles with outdated, incorrect or alternative 1:42 spelling titles to the desired target page. We remove all such redirects and rewire all a ected links accordingly. As a result, the number of processed articles is smaller than their overall number (see Table 4). In addition to the FRWC we extracted a corpus of Wikipedia subgraphs (see Section 3.2.2 for the formal de nition of these graphs and Table 5 for the corpus statistics). Subsequently, we denote the two variants in this Wikipedia corpus WP R 1 and WP R 2. We choose 25 articles about cities or regions matching the titles of the wikis in the FRWC and additionally include the subgraphs of six o -topic articles to build two additional corpora, called WP O 1 and WP O 2, for purposes of comparison.  We process the content, link structure and meta data (e.g. authorship-related information) of all articles in our corpora. is includes their history, that is, the chains of revisions which led to their current state. We do not consider past states of link structure and content itself but incorporate  Table 5. Wikipedia-based corpora: number of content articles (#articles n), revisions (#revisions n) and authors (#authors n) of non-redirecting articles in WP R 1 (n = 1) and WP R 2 (n = 2) of the German Wikipedia dump from 2018-07-01 (subgraphs 1-25); the variable n codes the nth orbit (see Formula 32). the authorship and the amount of content being added or removed per revision (see Section 3.2.3). e wikis considered here are based on MediaWiki. e structure of their articles varies from wiki to wiki, so that HTML-based extractions are error-prone. To circumvent this problem, we use WikiDragon [47], a Java-based framework for importing and processing wikis o ine.
For our experiments we used, adapted and newly developed several tools including the socalled GeneticClassi erWorkbench (GCW), a Python library for performing feature selections and sensitivity analyses in classi cation experiments. Since our experiments are based on feature vectors with a size of sometimes more than 100 features, a complete sensitivity analysis of all feature combinations was not possible. erefore, we conducted a genetic search for the best performing subset of features due to maximizing the F -score. at is, a population of p features is evaluated and mutated over a number of t rounds. Instances which score best are saved unchanged for the next round and partly added in a slightly mutated form. e worst performing instances are removed and replaced by random feature combinations. e Workbench is based on the Python library scikit-learn [115] allowing us to abstract from the underlying machine learning paradigm so that the same genetic search can be applied to optimize di erent classi ers. We experimented with neural networks which produced similar results on our test data, but took too much time to be used for genetic searches and random baseline computations. erefore, we decided for Support Vector Machines (SVM) as the embedded method of supervised learning using the Radial Basis Function (RBF) as a kernel. Our source code is open source on GitHub (h ps://github.com/tex echnologylab/GeneticClassi erWorkbench).

Classification Experiments
We investigate the similarities of our seven corpora of regional wikis (C , R and O ) and of Wikipedia-based subgraphs (WP R 1, WP R 2, WP O 1 and WP O 2) (each de ning a corpus of texts) in order to test Hypothesis 1 and its derivatives, that is, Hypothesis 2, 3 and 4.
us, we distinguish up to seven target classes in our experiments. For reasons of simplicity, we call each element of these corpora wiki and each of the seven classes genre. Unless otherwise stated, the experiments are performed on all of them. In the case of WP R 2 and WP O 2, we did not induce the corresponding ATNs, as some of these would have included several million edit events. us, in this case we have at most ve target classes. Each experiment includes three consecutive steps: (1) e all variant: e rst step, denoted by all, is a hyperplane parameter optimization and evaluation using the entire feature set. e optimized parameters of the respective classi er are then used in subsequent steps. Ideally, the parameters are optimized independently for each step, but this would have slowed down the genetic search. (2) e opt variant: In the 2 nd step, denoted by opt, genetic searches for optimal feature subsets are performed using a population of 20 feature vector instances and 50 rounds, trying to maximize the F -score of the classi cation. Note that these searches may only reach a local maximum. (3) e ext variant: For experiments which are not conducted on random baseline data, we perform an extended genetic search for optimal feature subsets based on 20 instances and 500 rounds. In an additional step, a bit-wise genetic optimization a empts to further minimize the number of used features while keeping or even improving the F -score, using 20 instances and 500 rounds.

Graph-Similarity based classification.
Using the apparatus of Section 3.2.6, each TN (ATN or TTN) of each MTN is represented by a vector of values indicating its similarities to the wikis of the underlying experiment. Any such vector is separately computed for each of the 11 similarity measures of Table 3. us, if T is the set of all TNs of whatever mode (ATN or TTN) and genre (C , R etc.) and if T ⊆ T is a subset of these TNs used in a classi cation experiment concerning the genres (target classes) Genre i 1 , . . . Genre i j (c.f. Figure 16), then each topic network T ∈ T is represented for each similarity measure by a |T |-dimensional feature vector which is processed by the three-step algorithm described above. If for a given similarity measure the topic networks derived from wikis of the same genre are mapped to neighboring similarity vectors, then they belong to overlapping neighborhoods in vector space: related networks are similar in their similarity and dissimilarity relations. In this way, TNs of the same genre should become as recognizable as TNs of di erent genres. Now we see why a genetic search for optimal subsets of  features is necessary: the reason is that otherwise we would assume that all dimensions of our feature vectors are equally informative -an assumption that is probably wrong.
Relating to Hypothesis 3, Table 6 and Table 7 summarize our ndings regarding the genresensitive classi cation of TTNs and ATNs, respectively. Cosine-based measures always perform best. Especially in the case of ATNs we see that accounting for arcs and for nodes secures be er performance: dual weight-dependent measures (see Section 3.2.6) outperform single weight-dependent or weight-insensitive measures. However, in the case of TTNs, we also see that as long as we do not perform an extended optimization (ext), the measure cos A V [¬w, ∞, ϕ 1 , L 12 ], which disregards arc weights, is a best performer. Of special interest is cos A V [w, ∞, ϕ 2 , L 12 ], the best performer regarding the classi cation of ATNs (Table 7), which is not only arc and node sensitive, but also weights nodes as a function of their degree centrality and therefore covers the highest amount of structural information among all candidates considered here.
is measure is also a robust candidate working at a high level in both experiments (it is the 2 nd best performer in the case of TTNs if being optimized by an extended genetic search). us, we conclude that spherical measures clearly outperform GED-related approaches and especially network-topology-based approaches (ToSi and NetSimile) which perform worst: the kind of information we seek is apparently ignored or "abstracted away" by the la er measures. However, NetSimile has at least a high optimization potential (see the column ext in Table 6) -a potential which is missing in the case of ToSi. In any event, non of the measures considered here is outperformed by our baselines. But in Table 6 Table 7. F -scores of classifying ATNs into five classes (C , R , O , WP R 1 and WP O 1) by means of SVMs using RBF kernels. Column all: F -scores using all features in terms of the respective similarity measure. Column opt: using a subset of features detected according to a genetic search. Column ext: subset selection according to extended genetic optimization. Additionally, F -scores of random baselines B1, B2, B3 and B4 are displayed, in the la er three cases di erentiated for the variants all and opt. also see that B3 (opt) approaches ToSi (all); in Table 7 we make analog observations also by example of other measures. A serious problem concerns NetSimile in relation to Baseline B2 regarding the classi cation of ATNs (Table 7): the baseline surpasses the topology-related measure whether being optimized (opt) or not (all). e graph indices collected by NetSimile have obviously di culties in making observed networks distinguishable from their random counterparts -at least in some of the cases considered here. B3 is also of interest with regard to the classi cation of ATNs, which achieves F -scores of up to 40% and thus makes representation models based on measures such as NetSimile, ToSi and wges problematic candidates. e values of B4 opt are also remarkably high and can therefore be regarded as a challenge for the measures. Figure 17 shows that the baselines B1, B3 and B4 are outperformed by the results obtained for TTNs. However, it also shows that feature optimization a ects the random baselines. is is particularly evident in the case of B3, which is based on random matrices. is gain in F -score can be explained by random numbers that allow the target classes to be separated -at least to some extent. ese features are then selected by the genetic feature selection. e baseline results for ATNs show a similar picture (see Figure 17, right). Regarding B2, we make the following observations in Figure 17 (right) (for reasons of complexity we did not consider all measures to compute B2): although the best B2 candidates are be er than the average F -scores calculated on the basis of real data, B2 is clearly surpassed on average. us, we come to the conclusion that we found e ective measures for comparing networks -this concerns in particular the spherical approach based on the cosine measure. From these experiments we conclude:  Table 3 and underlying the F -scores of Table 6 (first six columns) and Table 7 (last six columns). Distributions are distinguished by considering all features (all) or subsets of them generated by the genetic optimizations opt or ext.
(1) Hypothesis 3 is not falsi ed: we know the genre of a topic network by its structure. Note that this only concerns Scenario 2 and 3 of Figure 16 -Scenario 4 is not computed here. Similarly, by calculating our baselines, this also involves the scenarios 7 and 8 while ignoring Scenario 9 . e classi cation bene ts especially from information that is explored by dual weight-dependent measures. is holds regardless of the mode (ATN or TTN).
(2) Spherical measures should be preferred to GED-based measures, and these in turn to topology-based measures: spherical GED topological (80) e boxplots in Figure 18 give another perspective on the classi cation results by summarizing the distributions of precision and recall values generated by the graph similarity measures. Except for the results on ATN using all features, the average precision is higher than the average recall. e gure also demonstrates the strong e ect of feature selection.  Table 3 underlying the F -scores of Table 6. Distributions are distinguished by the respective target class of the classification.
So far, we considered classi cations as a whole and thus abstracted from the scores obtained for individual genres. e boxplots in Figure 19 give insights into these genre-related scores regarding the classi cation of TTNs by means of the extended feature optimization (ext). e members of the genre C are well identi ed: in terms of recall and precision. e genre R is far less separable and causes many classi cation errors (low recall). Apparently, this class contains more heterogeneous TTNs. In any event, the Wikipedia-based genres WP R 1 and WP R 2 are very well separated. By contrast, instances of the category O are extremely di cult to detect (as predicted in Section 3.2.7, page 40). Similarly, elements of the classes WP O 1 and WP O 2 are di cult to identify -albeit to a minor degree. us we conclude: the upper bound of separability concerns Wikipedia-based regional wikis. e corresponding subgraphs are very similar. is upper bound is approached by city wikis. Region wikis are less homogeneous, making the corresponding class R rather blurred and therefore question its status as a genre. Figure 21 shows the corresponding results of classifying ATNs. e general picture is quite similar to that of the TTNs.
We take another perspective on the results to examine classi cation errors. e best results on TTNs using all features is achieved by cos A V [¬w, ∞, ϕ 1 , L 12 ]. Figure 20 shows to what degree wikis of a target class are wrongly classi ed using this measure. e labels show the proportion of the categories according to the gold standard (top) and the classi cation result (bo om). e picture is diverse, but some details become clear: wikis of the classes R and O are o en falsely categorized as C . City wikis on the other hand are wrongly classi ed as WP O 1 or WP R 1. Genetic feature selection has proven to increase F -score signi cantly. In the extended optimization (ext) the last step is to minimize the number of features used. Since our features stand for similarities to networks, we have to ask whether some of the wikis underlying these networks are more relevant for the di erentiation of the target classes than others -possibly because of their prototypical status. If all wikis were equally important, an equal distribution of the frequencies with which these features are selected by the genetic optimization would be expected. Figure  22 shows the corresponding rank frequency distribution: it shows that we are far from evenly  Table 3 underlying the F -scores of Table 7. Distributions are distinguished by the respective target class of the classification. distributed features. From this we conclude that the selection of features is indispensable and that the underlying wikis are very di erent in their roles in our classi cation experiments.
Next, we try to distinguish TTNs from ATNs thereby addressing Hypothesis 2 (or more speci cally Scenario 1 of Figure 16). e error analysis in Figure 23 shows that networks of these two modes are not separable using our approach. Table 8 di erentiates this outcome by reporting the results obtained for di erent measures. It shows that this classi cation scenario is far exceeded by Baseline B1 and is therefore irrelevant. From this result we conclude that ATNs are so similar to their corresponding TTNs that they cannot be distinguished by our measures, or alternatively: our similarity measures are not suitable to distinguish them. is is not surprising, as the order and size of an ATN always corresponds to the order and size of the TTN from which it was derived, so that they can only di er by the weighting of their nodes and arcs. By concerning Hypothesis 4 and thus by distinguishing twelve target classes (in the case of WP O 2 and WP R 2 we do not induce ATNs), Table 8 shows a somehow di erent scenario: though the F -scores are still rather low, Baseline B1 is clearly outperformed when using a cosine measure for graph similarity measurement. From this observation, we conclude that while Hypothesis 2 is falsi ed, there is at   least a potential regarding the simultaneous distinction of genre and mode: ATNs do not uniformly resemble their corresponding TTNs. So far we considered part (2) of Hypothesis 1 by showing that TTNs (and also ATNs) with similar functions resemble each other, while di ering from networks of other genres. It remains to be shown that these networks are also thematically focused -in a highly skewed manner. To test this, we t power laws to the distributions of node weights in TTNs. Remember that these weights result from detecting textual instances of the topic represented by the respective node so that the more such instances are detected, the more salient the topic in the network. Fi ing a power law to such a distribution means that there is a minority of topics or just one topic that surpasses all other topics in its importance, while the majority of topics is of li le or no importance. e boxplots in Figure 24 (le ) show the distribution of the exponents of the power laws ed to these distributions, di erentiated by the genres considered here. To assess the goodness of the ings we compute the adjusted R-squares and display the value distributions in Figure 24 (right). Obviously, the ts are very good (the adjusted R-squares are on average above 95%) while the averages of the exponents range between 0.5 and 1.5: From this analysis we conclude that the underlying wikis are all thematically focused and skewed by dealing with a minority of topics in depth. e ve most detected DDC labels per genre are shown in Table 9. It shows that Transportation; ground transportation is by far the most dominant topic in city wikis and in region wikis. Obviously, these wikis are thematically focused in a highly skewed manner.
It remains to be shown that our ndings about urban wikis neither depend on the distances of the corresponding places nor on the communities writing these wikis. Figure 25 shows that the similarities detected by us do hardly correlate with the underlying distances of the places. In the heatmap in Figure 25 (le ), a connection between two city wikis is the greener, the closer and the more similar they are to each other, while a pair of wikis is the more red, the less similar and the more distant they are. Similarity is measured by cos[w, ∞, ϕ 1 , L 12 ] while distance is converted into closeness and normalized to the unit interval (the values of the heatmap scale to [−1, 1] by calculating −1 + closeness + similarity). Figure 25 (right), shows that there is hardly a tendency to being more similar when being more close to each other. e lower similarity values are mostly induced by the rather unusually small wikis such as Boppard (see Table 4). Figure 26 shows the Fuzzy Jaccard of the communities underlying the wikis, that is, the overlap of these communities weighted by the activities of their authors: the lower the number of shared authors of two wikis and the less active these authors, the lower the fuzzy overlap of these wikis.    Figure 25 shows that while among the Wikipedia-based extractions the overlap is remarkably high, it does nearly not exist between any of the city or region wikis: these wikis are wri en by mostly completely di erent communities. e picture is not di erent if one considers all authorsregistered and unregistered.

DISCUSSION
Section 4 has shown that topic networks, whether TTNs or ATNs, are similar if they belong to the same genre, while they are characterized by a high degree of thematic focusing. In order to operationalize this notion of network similarity, we tested, further or newly developed 11 di erent measures of network similarity by relying on four di erent paradigms of measuring the similarity of graphs (see Table 3 and the discussion of graph/network similarity measures in Section 3.2.6) as instantiated by the complex networks studied here. All these measures and paradigms come along with a di erent notion of network similarity. We have shown that a subclass of them, especially cosine-based measures of network similarity, allow for detecting similarities of topic networks in line with Hypothesis 3 and 4. At the same time, the concept of network similarity underlying this class of dual weight-dependent measures seems to be the most promising from a research point of view, as it is based on node and arc weights and instantiates a very intuitive concept of network similarity: e more similar two networks are from the perspective of the more of their nodes, the more similar they are. us, at the level of thematic abstraction examined here, there seems to be a hidden tendency to write about very prominent topics when it comes to thematizing places and linking the underlying texts in such a way that the resulting networks become almost indistinguishable.
Starting from this kind of thematic distortion of VGI as conveyed by online media, we now ask for a more general explanation of our ndings. e candidate we are considering for this purpose is given by Cognitive Maps (CM) which were introduced as models of the cognitive representation and processing of spatial information to explain a number of di erent cognitive biases. Because of bridging the gap between geographical information and its biased representation, CMs promise to be a candidate for our task. At the same time, this notion allows for the connection of cognitive geography on the one hand and our generalized model of linguistic encoding of geographical information on the other (see Figure 1). e reason is that as mental representations, CMs are seen to integrate a wide range of representations of spatial objects, their relations and thematic units (see below). We may argue now that we developed a method to represent and analyze a particular type of thematic information which can be subsumed under the la er list. If this is true, then the thematic distortion observed by us could be seen as a result of the biased processing of geographic information by a community of agents dealing with the same place to generate a common cognitive map thereby manifesting a particular type of distributed cognition. When creating such a common CM of the same place, agents tend to focus on a highly selected set of rhemes (see Figure 1), even if there is no explicit agreement among these agents about this selection, and even if there is li le or no direct communication between them and also irrespective of the focal place. It seems that the agents participate in processes of distributed cognition in such a way that their own thematically distorted maps ow into the formation of a shared, stable but likewise distorted "thematic map". ese maps then appear as the result of a sort of swarm behavior regarding the formation of a particular distribution of the preference and salience of certain place-related rhemes. From this perspective, topic networks serve as models of these thematic maps which in turn are parts of CMs. To underpin this interpretation, we brie y summarize the research on CMs and, above all, ask about distortions that are distinguished by the research in this area.
Understood as mental representations of spatial knowledge, CMs have been subject of scienti c work for decades. Starting from di erent disciplinary perspectives, this research provides insights into how people perceive their environment, think about it and how this in uences their spatial behavior. e interdisciplinary research on CMs has led to a multitude of notions, research designs, and outcomes, the integration of which is still pending. Over the years, researchers worked, for example, with di erent terms for the mental representations in question such as cognitive maps [133], environmental images [92], mental maps [52], mental sketch maps [46], narrative space maps [62], or internal representations [116], where the constituent map is most common. However, there has been a discussion as to whether the term map is generally misleading. In this context, Kitchin [75, 3pp.] distinguishes approaches that understand CMs as (1) three-dimensional maps, (2) an analogy to maps (because of their map-like characteristics), (3) a metaphor for maps (because they function as if they were maps) or as (4) a hypothetical construct used to explain spatial behavior. While we refer to cognitive maps as an auxiliary notion, we adhere to the fourth of these variants. Regardless of this discussion, there is a greater consensus on some characteristics of CMs as mental representations: CMs are understood as complexes of mental images and concepts that humans have in mind when thinking about places, their location (in terms of distance and direction), accessibility (regarding questions like how to get there) and the meanings associated with them. ey serve as a means of understanding spatial circumstances and as a frame of reference for the interpretation, preference and prediction of spatial structures, their relations and events in which they participate (see [34, 100pp.,313], [52,3] and [92, 5p.]). Beyond that, they also serve as a basis for decision-making regarding spatial behavior (e.g. in route planning). In a nutshell, humans activate, generate and utilize CMs in spatial thinking and spatial behavior [cf. 48,233]. CMs are distinguished according to the entities they model. Kitchin and Blades [74, 5p.] distinguish CMs of object spaces (e.g. rooms, cars), environmental spaces (e.g. buildings, streets, neighborhoods, cities), geographical spaces (e.g. regions, countries), panoramic spaces and of map spaces (including models) [cf. 41]. In this way, they cover existing as well as imagined places, where facts about the former can be mixed with imaginations of the la er [35]. is list includes the kind of places that are central to our study, especially cities.
To build a bridge between the notion of CMs and our analysis, we need to look more closely at their content and the principles by which they are created. Generally speaking, CMs are seen to cover at least two types of information (see [75, 1p.] and [35, 314p.]): (1) Regarding spatial cognition, this concerns information about where entities are located in the environment of a person (location, distance and direction in relation to her location or to reference points like landmarks). (2) Regarding environmental cognition, this concerns information about the kind of these entities, their a ributes, meanings, valuations and a itudes that the person associates with them -individually, socially or culturally mediated [48,224,235]. Our study focuses on the second part of this distinction: it is related to the rhemes that are associated with places as framing themes (see Section 1). In any event, CMs are systematically characterized by distortions [35,315] concerning judgments about locations, distances and directions as well as the formation of preferences which e ect spatial or environmental cognition. One example is the localization e ect [52] according to which people can discriminate nearby places be er and have stronger preferences for them [see also 48]. is relates to errors in distance judgments depending on the perspective from which they are made: more di erences are seen between closer areas than between more distant ones, so that shorter distances are exaggerated, while longer distances are underestimated [135,133]. Furthermore, spatial knowledge can be organized by reference to landmarks which "distort" places in their "neighborhood" so that buildings, for example, are judged to be closer to them than vice versa [135,134]. Tversky [135, 135pp.] describes additional modes of distortion: to remember the position and orientation of objects, humans isolate them from their background and organize them by referring to a general frame of reference (rotation) or to other gures (alignment). While these examples primarily concern spatial cognition, the following bias focuses more on environmental cognition.
is concerns the hierarchical organization of conceptual systems according to which places of the same category are supposed to be closer in distance than places of di erent categories, while the direction of a category (with a direction slot) determines the one of its members [135, 132p.]. Last but not least, Golledge and Stimson [48] describe distortions of the representation of urban spaces. ey observe that interactions in uence the perception of a city in the sense that spatial information accumulates along the representations of the paths used to carry out these interactions. Likewise, structural properties of cities which are more salient than others are likely to become anchor points in CMs. In such maps, areas between used paths and anchor points may appear to be "folded" or "wrapped" so that preferred visited places are represented closer to each other. As a result, positional and relational errors can occur in perception (see [48,254] and [49,7]).
To interpret our ndings in the light of this research, we need to link the formation of CMs with linguistic processes. e idea that this formation is substantially in uenced by human language processing, so that geographical information is non-trivially encoded in linguistic structure, goes back to the work of Louwerse [cf. 89] (see Section 1; see also Montello & Freundschuh [107,171] for an earlier hint on "obtain[ing] spatial knowledge through language"). In this context, Golledge & Stimson [48,235] distinguish shared components of CMs from personalized ones by stating that " e common elements facilitate communication with others about the characteristics of an environment; the idiosyncratic elements provide the basis of the personalized responses to such situations". Our hypothesis is now that at the level of thematic abstraction as modeled here, the organization of platial rhemes shared by the members of a community is in uenced by the general law of preferential order which is most prominently instantiated by Zipf's rst law [147]. Such an organization makes the anticipation of a place rather expectable among the members of a community so that communication about this place is facilitated as predicted by Golledge & Stimson [48].
is Zip an organization allows for relating our ndings to the well known power-law-like degree distributions found in many natural, social, semiotic or technical networks (see [109,113] and especially [112] for overviews of this and related research) and also by example of many linguistic systems -especially on the text level [108,119,134]. Because of this commonality, one might assume that we just detected a well-known text or network characteristic. Characteristic for our ndings, however, is that we developed a measurement procedure that detects a text (corpus)related semantic, thematic trend -with the help of network theory: Instead of counting directly observable arcs, for example, in ontological networks or co-occurrences in texts and instead of relying on monoplex networks [1,5,8,23,38,39,98], we generated and analyzed a range of di erent networks in relation to each other in order to determine the corresponding thematic trend by means of multiplex networks. is is not to say that we rst discovered a Zip an process in the organization of linguistic networks, but rather that we observe such a process in a very speci c area, in which it has not been observed before and which requires an appropriate explanation as elaborated so far. Indeed, if thematic salience is skewed, and if skewed topic distributions derived from di erent corpora are similar not only topologically but also regarding the ranking of the majority of salient topics, such an observation requires explanation subject to the fact that the underlying text networks are constituted by di erent, distributed communities of authors. It is the answer to this question that the paper was about.
At this point one might further object that we made a rather expectable observation in the sense that descriptions of cities, for example, are very likely related to rhemes like tra c, trade, culture, history etc. However, this would mean underestimating our results: (i) the thematic distortions observed by us are extremely skewed, (ii) they seem to emerge rather earlier in the development of a wiki 21 and (iii) they make both members of the same genre similar while allowing for distinguishing members of di erent genres. To phrase it as a question: If the number of rhemes under which places are thematized is limited, why then should always a tiny subset of them dominate the discourse about a place and why then should the networking of these rhemes make discourses of the same genre identi able? From this point of view, we argue that we discovered an additional form of the distortion of CMs, which means that the underlying place is always conceptualized from the point of view of a few but extremely preferred rhemes. When organizing their distributed processes of co-authorship, communities of authors seem to strive to a kind of thematic uni cation that makes di erent wikis serving alike functions looking structurally similar -with respect to the preference order of themes and their networking. It seems that people participate in processes of collaborative writing with a tendency to organize their thematic contributions and references in such a way that they remain shareable [42] and communicable among members of the same community. Ensuring shareability means securing the continued existence of the underlying wiki, which could otherwise collapse because of too many personalized or individualized fragmentations. At this point we can speculate that people unconsciously prefer such thematic contributions that make their social roles and participations expectable and acceptable, whereby this selection behavior produces the described similarity of thematic maps as components of CMs. In other words, the participants anticipate social roles and neglect their personal view of cities and regions, whose documentation would fragment the corresponding media thematically. Instead, they ignore the reproduction of their idiosyncratic, personalized views of places. To say it in terms of the distinction made by Golledge & Stimson [48] between shared and personalized components of CMs: participants overweight the former to the disadvantage of the la er to guarantee the shareability [42,43] of CMs as a result of distributed cognition.
Note that in our study we did not simply map a frequency e ect by our measurements: although we counted frequencies of topic assignments, they were determined by means of an inference process that went through a process of (machine) learning. To support such an interpretation, however, a deeper analysis with a larger corpus of wikis and related media providing di erent functions is required. is also requires experiments with other and above all much ner classi cation systems than the DDC to nd out how much the use of the DDC has in uenced our measurements. And it requires a deeper analysis of the social roles of authors in online media, their interactions and the regulatory systems under which they interact. But this already concerns future work.

CONCLUSION
We developed a novel model of topic networks in order to investigate the networking of rhemes addressing the same places in underlying corpora of natural language texts. We developed our network model in a way that it enables thematic comparisons of previously unforeseen text corpora using an underlying reference corpus, o ers a generic solution to the problem of topic labeling, is highly scalable and can therefore map even the smallest text snippets to topic distributions, simultaneously takes rare topics into account and is methodologically open and expandable. Moreover, our model allows for comparatively investigating the networking of thematic units from di erent angles. In this way, it is open and expandable as it allows for integrating di erent analytical perspectives into the study of the same semantic networks. We exempli ed our model by means of corpora of special wikis and extracts from Wikipedia in order to investigate how textual information encodes geographical information on the aboutness level of texts. Our experiments show that the thematizations of di erent places on a certain level of abstraction are similar to each other in that 21 is is not shown here, but is the result of a pretest in which we looked at the life cycles of three di erent wikis. In future work we will analyze the underlying time series of multiplex topic networks in detail.

1:58
they focus on a few themes in a highly distorted manner while networking them in similar ways.
is happens regardless of whether the underlying media are generated by di erent communities and whether these communities address related or unrelated places in nearby or distant places. We interpreted our ndings in the context of the notion of cognitive maps. To this end, we proposed to extend this notion in terms of thematic maps and argued that participants or interlocutors of online communication tend to organize their contributions in a way that makes them sharable.
is means that the contributions are abstracted and depersonalized at the aboutness level in such a way that the social roles of these participants become expectable and acceptable, while their personal views of places are reduced whose documentation would fragment the corresponding media thematically. Ensuring shareability means securing the continued existence of the wiki, which could otherwise collapse in the face of too many personalized or individualized fragmentations. Future work concerns several tasks: We want to conduct deeper analyses based on larger corpora that manifest a greater variety of communication functions in order to shed more light on the genre sensitivity discovered in our study. Beyond the DDC, we strive for the use of ner structured, higher resolution classi cation systems in order to model the contents of texts much more precisely. Ideally this should be carried out with the help of systems like the category system of Wikipedia or even Wikidata, both of which develop as open topic universes [101]. Last but not least, a deeper analysis of the social roles of authors in online media and their co-authorship is required to gain a deeper understanding of the processes of linguistic encoding of geographical information. is will be the task of future work.

ACKNOWLEDGMENT
Financial support by the Federal Ministry of Education and Research (BMBF) via the Centre for the Digital Foundation of Research in the Humanities, Social, and Educational Sciences CEDIFOR) is gratefully acknowledged.