Cohort and Rhyme Priming Emerge from the Multiplex Network Structure of the Mental Lexicon

Complex networks recently opened new ways for investigating how language use is influenced by the mental representation of word similarities. This work adopts the framework of multiplex lexical networks for investigating lexical retrieval from memory. The focus is on priming, i.e., exposure to a given stimulus facilitating or inhibiting retrieval of a given lexical item. Supported by recent findings of network distance influencing lexical retrieval, the multiplex network approach tests how the layout of hundreds of thousands of word-word similarities in the mental lexicon can lead to priming effects on multiple combined semantic and phonological levels. Results provide quantitative evidence that phonological priming effects are encoded directly in the multiplex structure of the mental representation of words sharing phonemes either in their onsets (cohort priming) or at their ends (rhyme priming). By comparison with randomised null models, both cohort and rhyming effects are found to be emerging properties of the mental lexicon arising from its multiplexity. These priming effects are absent on individual layers but become prominent on the combined multiplex structure. The emergence of priming effects is displayed both when only semantic layers are considered, an approximated representation of the so-called semantic memory, and when semantics is enriched with phonological similarities, an approximated representation of the lexical-auditory nature of the mental lexicon. Multiplex lexical networks can account for connections between semantic and phonological information in the mental lexicon and hence represent a promising modelling route for shedding light on the interplay between multiple aspects of language and human cognition in synergy with experimental psycholinguistic data.

Network science provided language scientists with quantitative ways of representing and analysing the structure of lexical items within the mental lexicon [1,4,12,22].For instance, concepts such as percolation techniques were used for detecting patterns of word confusability in phonology [12,22], strategies of language learning in healthy and clinical populations of children [6,23], differences in the levels of creativity of individual healthy subjects [3,11], or differences in the production of words in people with aphasia [17,18] or Alzheimer's disease [24].However, the above studies considered only one aspect of language for establishing similarities among words, e.g., building single-layer networks including only phonological similarities among words [12].While this focus was valuable for investigating on large scales how thousands of similarities among words influenced processes such as word identification or memorisation tasks [12,13,25], the way humans store and memorise words is inherently multirelational [1,23,26].Multiple types of semantic and phonological similarities among words are present simultaneously, and they can either compete or assist specific language processes in different ways [1,5,27,28].For instance, a recent empirical investigation indicated that toddlers simultaneously exploit both phonological and semantic features of words in early language learning [23,29].
Phonological and semantic relationships can also affect lexical retrieval in different ways.Lexical retrieval is a set of cognitive processes and executive functions related to the identification of a specific cognitive unit (e.g., a word) from semantic memory [30] subsequently to a given visual or auditory input (e.g., hearing or reading a given word) [15,[31][32][33].Conceptual similarities can cause the so-called primingphenomenon, where one lexical item (a prime) facilitates or inhibits the retrieval of another word (a target) [5,32,34,35].Priming can happen with different modalities depending on how prime and target are processed (e.g., visual-visual, auditory-auditory, or crossmodal) and can involve perceptual, semantic, or conceptual types of similarities between prime and target [35].Facilitative semantic priming happens when a target word (e.g., "hawk") is processed faster and more accurately when preceded by a semantically related stimulus (e.g., "dove") than when preceded by an unrelated word (e.g., "prosthetics") [34].Empirical work has shown that facilitative semantic priming decayed more quickly over time when words were processed individually compared to when words were processed in sentences [35,36].This empirical evidence has been linked with the richer structure of semantic associations among words in a sentence [35,36], indicating a positive correlation between word-word associations and facilitatory semantic priming.On the other hand, semantic inhibition or interference happens mainly through visual and perceptual modalities [35].For instance, ignoring a picture representing a "dog" can produce subsequent slowing when responding to the word "cat".
Semantic priming typically only considers primes and targets belonging to the same semantic category (e.g., "hawk" and "dove" are both types of birds).However, words can be semantically related in other ways, which were often captured through free associations (e.g., "bed" and "pillow" are often provided as free associations when talking about bedroom furniture).Indeed, associative priming has been shown to crucially depend on the time between the beginning of the prime and the onset of the target [35,37], a time window also called stimulus onset asynchrony (SOA).A longer SOA between prime-target pairs corresponded to stronger facilitative priming effects, whereas nonassociated prime-target pairs corresponded to inhibitory priming effects independent to the SOA.Rather than exploiting taxonomical, semantic, or cooccurrence similarities, perceptual priming depends on the form of the stimulus.A similar priming effect occurs with phonological similarities [28,38].Hearing primes can lead to easier lexical processing of phonologically similar target words [28,35].
Inhibitory priming relies on mechanisms restricting access to specific concepts, and the investigation of such inhibitory dynamics still represents an open challenge in the relevant literature [5,28,35].Facilitative priming is well explained by network models of semantic memory [5,15,28,35,39,40] using spreading activation mechanisms.Although its mechanisms remain an open challenge in neuropsychology [5,11], past attempts have successfully modelled semantic memory as a complex network in order to obtain limited but meaningful insights of facilitative priming effects and lexical retrieval latencies in word identification tasks [1,2,15,39,40].Collins and Loftus represented semantic memory as a conceptual network with links placed between concepts that shared features.When a given stimulus was activated (e.g., reading the word "animal"), then many words in the semantic levels of the mental lexicon received portions of activation, proportionally to their semantic relatedness to the stimulus.The activation spread across semantic similarities and it ensued until it converged on a single target, more or less related to the stimulus, which was then retrieved.Hence, lexical retrieval of an item was relative to a network node receiving a convergence of activation from across its connections.Importantly, the spread of activation could cover far distances of time but decreased in intensity.According to this model, the retrieval of target words was facilitated by having primes close or adjacent to the prime words.Furthermore, the model could interpret empirical evidence of longer SOAs leading to stronger facilitative priming [37] in terms of activation accumulating over a given lexical item, leading to faster and more accurate concept retrieval.
In Collins and Quillian's experiments [40], subjects were asked to read and verify statements relating to two concepts, e.g., a canary is a bird.The time it took for participants to verify a statement correlated positively with the distance between concepts (e.g., canary and bird) in the conceptual network representation of semantic memory [39,40], i.e., the smallest number of semantic similarities connecting concepts.This represented preliminary evidence that network distance in semantic networks correlates with lexical retrieval patterns, although it was limited only to a rigid network structure encompassing only semantic features of words.
More recent approaches have modelled a semantic network as a web of free associations among concepts [3,11,15], i.e., relationships based on memory rather than on any strict definition of feature sharing.The importance of network distance for quantifying patterns of lexical retrieval was recently underlined in the recent work by Kenett et al. [15].The authors showed that success in free-and cuedrecall experiments decreased dramatically with increasing distance between concepts in a network of free associations.Furthermore, network distance predicted success in recall experiments considerably better than mainstream psycholinguistic techniques such as latent semantic analysis [34].Network distance has also been shown to influence lexical retrieval when considering a phonological network.For example, recent investigations showed how words at shorter mean network distance were more promptly recognised in a lexical decision task [14,25].These results strongly indicate a cognitive advantage in processing concepts at shorter network distances.In a spreading activation model of lexical retrieval, network distance might capture how spreading activation decays over the mental lexicon structure, further promoting the usage of network models and network distances for the investigation of lexical retrieval.
Additional empirical evidence has shown that phonological similarities can reduce naming latencies in picture 2 Complexity naming tasks, an effect known as phonological facilitation [27].This evidence led to the inclusion of phonological aspects of the mental lexicon for obtaining more refined models of lexical retrieval from the auditory input.In case of hearing a word rather than reading it, more recent work has proposed a spreading activation mechanism including phonological similarities among words [12,[41][42][43][44]. Within a bottom-up process, activation first spreads among phonological neighbours of the stimulus and then moves up across semantic memory, ultimately leading to word identification and retrieval.
In agreement with the above approaches, the present study adopts the assumption that the mental lexicon encapsulates not only linguistic features of individual words (e.g., their meaning, their orthography, their phonology, etc.) but also their similarities.However, the present investigation builds on the previous network approaches to lexical retrieval [14,15,25] by considering within the same network representation both semantic and phonological similarities among words through the framework of multiplex lexical networks [8-10, 16, 45].In a multiplex lexical network, nodes represent words and links connect words differently according to specific network layers of similarities [8,9,45].For instance, Stella et al. [8,10] used a multiplex lexical network with layers representing free associations, shared semantic features, cooccurrences, and phonological similarities, which successfully predicted early word acquisition in toddlers.The first large-scale application of multiplex lexical networks was from Stella et al. [9], where the mental lexicon of an adult was approximated as a multilayer network with four layers of word similarities: free associations, synonyms, generalisations, and phonological similarities.Through a data-driven approach, intersecting many large-scale datasets about word frequency, age of acquisition, concreteness, and reaction times in lexical identification tasks, the authors identified a multiplex lexical core, a set of words tightly interconnected with each other, appearing suddenly during normative development around age 8 yrs.This core made the whole multiplex lexical network extremely resilient to cognitive impairments modelled as progressive random word removal.Multiplex lexical networks were adopted also in a clinical population of people with aphasia, revealing the importance of the multiplex structure for predicting correct picture naming [16].
This paper adopts multiplex lexical networks for studying two specific patterns of phonological priming in lexical retrieval: cohort priming and rhyme priming.The term cohort priming comes from cohort theory, a theory of lexical retrieval by Marslen-Wilson and colleagues [31].When hearing speech, the first phoneme heard "activates" every word in the lexicon with that phoneme in an access stage, resulting in a "cohort of words".For instance, hearing belief initially activates all words starting with the phoneme /b/, resulting in a very large cohort of possible words.As the next phoneme is heard, the cohort is further restricted, in this case, to words starting with /bI/ and so on, phoneme by phoneme.As more phonemes are added, fewer and fewer words are found as candidates until a recognition point is reached such that only one word is activated [31,33].This recognition point is known also as isolation point or uniqueness point [31].Cohort theory assumes a quite strict definition of cohorts and it does not consider lexical effects due to the structure of wordword similarities in the heard input (e.g., phrasal context) or in the mental lexicon [33].However, empirical studies have confirmed that the initial portion of a word activates similar sounding words that compete for recognition and, more importantly, are quicker to identify when primed by words in the same cohort [31,33,46].This facilitatory cohort priming effect was detected in case either primes were English words or nonwords sharing the first three phonemes with the target [46], supporting the assumption of activation of lexical items based on their initial phonetic structure.Notice that the simultaneous activation of lexical items corresponds not only to facilitatory priming effects but also to lexical competition in distinguishing words from the same cohorts [47].In word identification tasks without priming, targets in larger cohorts were found to be recognised less accurately than targets in smaller cohorts [47].However, this competition effect disappeared when words were presented in a phrasal context [28], indicating that the semantic and syntactic features of words extracted by sentences can interact with cohort structure and influence lexical retrieval of words in cohorts.The above experimental findings motivate further investigation of cohort priming effects also in relation to the semantic and syntactic levels of the mental lexicon.
Rhyme priming is analogous to cohort priming, in that sharing phonemes at the end of words can give rise to facilitatory priming effects [46].According to the relevant literature of priming effects, primes rhyming with a target lead to shorter and more accurate lexical retrieval compared to nonrhyming primes [46].A similar rhyme facilitation of lexical decisions to real-world targets was found also in nonfluent people with aphasia [48].Rhyming priming also has beneficial effects for the memorisation of words [49], especially in young children [38].Empirical studies have shown that this type of priming is weaker than cohort priming but still present during lexical retrieval [49].The current investigation of cohort and rhyme priming differs substantially from previous analyses of cohort and rhyme priming.Here, by assuming a network representation of the semantic and phonological subcomponents of the mental lexicon, the main aim is to detect cohort and rhyme priming effects in thousands of words by harnessing directly the structure of dozens of thousands of word-word similarities of different types rather than directly testing only a limited number of words, as in previous lab experiments [31,33,38,49].This multiplex network approach has three main strengths: (i) it can quantify which semantic or phonological layers are predominantly involved in potential priming effects; (ii) it can account for any potential interplay and nonlinear effects over priming arising from combining semantics and phonology, an interplay often neglected in previous network studies; (iii) it can be performed at large scales, testing a sample of words up to two orders of magnitude larger than in previous lab experiments [47].

Methods and Model
This section provides information on (i) the construction of the multiplex lexical network, (ii) the linguistic datasets used, (iii) the network metrics adopted and their psycholinguistic interpretation, and (iv) the null models used as a reference.
2.1.Construction of the Multiplex Lexical Network.The mental lexicon of an adult English speaker was represented as a multiplex lexical network including 8546 words connected over four network layers, analogous to previous approaches [9,16].The layers have been selected according to the spreading activation model for auditory input [12,18,21,41,42], in which language processing happens first over a subcomponent containing phonological information about words and subsequently over semantic memory.Hence, the multiplex lexical network is chosen in order to combine phonological and semantic aspects of language.More in detail, information about phonology is mediated by a layer of phonological similarities [4,22], where words are connected if they differ in the addition/substitution/deletion of one phoneme, e.g., "cat" would be connected to "cab" because of the above operational definition of sound similarity.Notice that other patterns of sound similarity are not directly captured by this metric (e.g., "cat" and "cob", which are 2 phoneme substitutions apart).Information about semantic memory is encapsulated within three different levels: (i) overlap in meaning was encapsulated in a layer of synonyms, where words were connected if they can have the same meaning, e.g., "meaningful" and "insightful" can have the same meaning (ii) the linguistic hierarchy of concepts was encapsulated in a layer of generalisations, where words were connected if they belonged to either a more specific or a more general semantic category, e.g., "dove" is a type of "bird" (iii) most of the remaining semantic similarities among words were encapsulated within a layer of empirical free associations, where words were connected if they were associated by participants during a free association tasks, e.g., "bed" reminds participants of "sleep" It is important to underline that free associations, generalisations, synonyms, and phonological similarities were all found to deeply affect lexical retrieval in several independent studies [1,2,12,44,50], hence the importance of including them in the current investigation.The free association network was built as a subgraph of the Edinburgh Associative Thesaurus [50].The synonym, the generalisation, and the phonological networks were built according to a dataset managed by Wolfram Research and based on WordNet 3.0 [51].All layers were treated for simplicity as undirected, and no cost associated with between-layer transitions was considered, analogous to previous studies in the relevant literature [8,15,16].Word features such as frequency were obtained from the large-scale repository Opensubtitles [52], which computes word frequencies from subtitles in TV series and movies.
As reported in Figure 1, the resulting multiplex network represents an edge-coloured graph [53,54].The same set of nodes is replicated on each layer but different types/colours of links among nodes can be present, with each colour corresponding to a specific layer.On this structure, transitions between layers are allowed by transitioning between replicas of nodes.The multiplex structure alters dramatically the layout of similarities among words.Words disconnected on a layer might be highly connected and central on the whole multiplex structure, like for instance "say" in the layer of generalisations and in the whole multiplex lexical network (see Figure 1).
The imbalance in modelling the multiplex lexical network with three semantic layers but only one phonological layer is due to (i) the relative importance in distinguishing 4 Complexity different semantic aspects of the lexicon (e.g., synonyms are different from taxonomical relationships) and (ii) to the relative difficulty of considering measures of sound similarities that provide more information than the definition of phonological similarity adopted in this work (cf.[4]).However, it should be noted that the free association layer overlaps more than random expectation with the layer of phonological similarities [8], indicating that the association layer is not purely semantic but it contains also some sort of phonological information in it.This reduces the imbalance between semantics and phonology in the chosen representation.Nonetheless, previous similarity results [9] indicated that the layer of free association still contains patterns of word-word similarities that were more similar to those encoded in the synonym and generalisation layers rather than to the phonological layer.For the present analysis, the free association layer was considered as a semantic layer, compatible with what previous studies assumed [2,3,15,55].

Testing Cohort Theory.
According to the cohort model, lexical retrieval happens when the isolation point (see Introduction) is reached, corresponding to a peak time inactivation [31,33].Phonemes heard prior to the peak time determine the onset of the word and, consequently, the number of words in that word's cohort.While the peak time may change for each word based on its context, empirical evidence indicates that the average peak time of a word is around 200 ms from when the word gets pronounced [31] and corresponds to having information about the first 3 or 4 phonemes of the word [28,33].Note that the above numbers represent average estimations, since the number of phonemes occurring in the 200 ms window can vary depending on the phoneme types (e.g., stops vs. fricatives vs. nasals).Since in the current dataset considering onsets made of 4 phonemes led to quite small cohorts, the focus shifted on onsets made of 3 phonemes, as tested also in previous studies [46].For every onset available in the current dataset, a cohort of words was built.In order to reduce the extent of systematic errors due to small sample sizes, only cohorts with more than 10 words were considered.This led to the selection of 2526 words from the multiplex lexical network.Selected words were subdivided into 99 cohorts of average size 30 ± 10 words.

Testing Rhyme
Priming.Rigid definitions like considering only the overlap in phonemes in the last positions of words cannot capture the wide variety of rhyming patterns in English [49].Rhymes depend not only on phoneme structure but also on additional features, like stress.In order to overcome this issue, the online rhyming dictionary Rhyme-Zone was used for selecting groups of rhyming words [56].
RhymeZone is partially based on WordNet [51] but it is also enriched with additional data from quotes and lyrics.The complete corpus of RhymeZone includes semantic and phonological information over almost 19 million words from 1061 dictionaries; hence, it represents a large-scale and crosschecked source of current rhymes in the English language.The current analysis focused on true rhymes, i.e., words with identical sounds after a stressed vowel.Homophones, different words having exactly the same phonemes, were not considered as rhyming words.According to this choice, 2247 rhyming words were selected from the multiplex lexical network.Selected words were subdivided into 51 rhyme classes (e.g., all words rhyming with "authorisation"), of average size 40 ± 10 words.In order to reduce the extent of systematic errors due to small sample sizes, only classes with more than 10 words were considered.
2.4.Network Metrics.As indicated in many recent investigations about lexical retrieval in semantic and phonological subcomponents of the mental lexicon, network distance is a reliable proxy of word relatedness as it is predictive of lexical retrieval [3,11,15,57].Network distance d ij between nodes i and j in a given network N is defined as the shortest number of links connecting i and j [58].In cases where there is no path connecting i and j, then nodes i and j are said to be disconnected and d ij is assumed to be equal to ∞.As reported in Figure 1, in the multiplex lexical network, paths can be made of links of different layers/colours.Therefore, there can be additional, nontrivial "multiplex" paths emerging from the multiplex structure, so that the network distance between two words on any individual layer can be dramatically different from the network distance between the same words on the whole multiplex network.For instance, bed and sleep might be disconnected on the phonological layer but connected on the free association layer.This richer behaviour of network distance on the multiplex network represents the interplay between phonological and semantic aspects of the mental lexicon.Notice that the whole multiplex lexical network is fully connected in the sense of De Domenico et al. [54], i.e., there is always a multiplex path connecting any two words when transitions across layers are allowed.However, individual layers are not fully connected, so that some words might be disconnected and hence correspond to a divergent distance d ij = ∞.In order to overcome the issue of having infinite distances, the closeness c ij of nodes i and j [58] is used, namely, the inverse of network distance: where c ij = 0 when i and j are disconnected.Considering the inverses of network distance gets rid of divergences, so that average finite estimators of distance can be computed.Provided that in the analysis individual network layers might be disconnected, a valid proxy for the central moment of the distribution of closeness is represented by the mean [58]: ranging from 0 (all nodes are disconnected) to 1 (all nodes are adjacent with each other).c * represents the harmonic mean of the distances of all node pairs in a given network, a measure also called efficiency [58].Notice that c * is analogous but not equivalent to closeness centrality C i , which is the arithmetic mean of distances of node pairs (for a Complexity comparison see [58]).In disconnected networks, the harmonic mean is a better estimator of closeness compared to the arithmetic mean; hence, in the following, c * is adopted for estimating how close words are on the multiplex lexical network.We assume that primes and targets that are closer on a network topology are processed faster and more accurately than words at greater network distance, as supported by recent empirical studies [15,25,57].Closeness is computed among words in specific subsets: (i) words in the same cohort and (ii) words having the same rhyme (i.e., composing a rhyming class).
2.5.Null Models.Quantifying the average closeness of words in cohorts and in rhyme classes requires a suitable null model for comparison and statistical testing.Since phonological information is important for defining both cohorts and rhymes, considering randomised lists of words satisfying constraints at the phonological level is an intuitive choice.As a viable approach, randomised cohorts/rhyme classes are built by sampling at random real words sharing at least m phonemes in any position.Both consecutive and nonconsecutive shared phonemes had to be considered, since limiting the null model to consider only overlapping consecutive phonemes outside of the onset/end resulted in sample size issues, e.g., too few words for statistical comparisons with cohorts and rhyme classes.Randomised cohorts/classes have the same size of the original ones.For cohorts, m is equal to, because in the operative definition of cohorts onsets are defined as having the same first three phonemes as a consequence of the average peak time.For rhymes, m can range between 2 and 4; the appropriate value is computed by calculating the number of phonemes that all words in a rhyme class have at their ends.The same m phonemes defining a cohort/class are used for building its randomised counterpart.For instance, consider the cohort "belief", "belong", "beloved", ... defined by phonemes /b/, /I/, /l/.A randomised cohort will include words sharing these phonemes but in positions different from the onset, e.g., "automobile", "abolish", "assembly", .... Preserving phoneme identity is important because different phonemes might lead to differences in phonological awareness and influence lexical processing [20].
The phonological constraint on the randomised lists guarantees that the same phonemes are present in both original cohorts/rhyme classes but in positions different from the onset/end of the word.Therefore, the considered null models allow us to test how phoneme sequences at the beginning and at the end of individual words influence lexical processing in relation to the multiplex structure.Hence, the proposed methodology investigates to what extent the multiplex lexical network is nonrandomly structured to cluster onset-sharing and rhyme-sharing words.To this aim, differences among individual phoneme sequences are averaged across cohorts/rhyme classes and a statistical test is performed between the average closeness of cohorts/rhyme classes and random expectation from the above null models.Nonparametric statistical testing, specifically a sign test, is adopted in order to obtain results robust to violations of normality due to the low sample size of cohorts or rhyme classes.
On the layers of free associations, synonyms, and generalisations, the distribution of average closeness for words in cohorts and rhyme classes was found to violate normality (Kolmogorov-Smirnov test, D > 0.08,p > 0 09) at a 0.05 significance level.
Comparison with the null models also enables one to test whether potential differences in closeness between cohorts/ rhymes and the randomised lists can be explained either by individual aspects of language or by the interplay between them, e.g., phonology and semantics or different aspects of semantics.This is achieved by computing network distance on individual layers and on the whole multiplex network representation separately.These results are then compared against another set of null models for the network layers where links are randomised.In each randomised layer, words have the same number of connections as in the respective empirical layer but connections are rewired uniformly at random.Hence, random rewiring preserves the degree distribution of words on a layer.Since the same word can have different degrees on different layers [9], then different rewired null models have to be adopted for the different layers of the multiplex network.These null models are also called configuration models in the network literature [59], and they preserve the number of total word-word similarities of individual words (i.e., nodes degrees) and also the heterogeneity in the number of similarities individual words can have (i.e., degree heterogeneity).Randomly rewiring every individual layer is expected to disrupt both intralayer correlations between nodes and interlayer correlations between links.Therefore, configuration models allow quantifying to what extent differences in closeness between cohorts/rhyme classes and random lists of words are due to either global patterns of network structure (which are disrupted by random rewiring) or just by heterogeneity in link allocation (which is fixed even under random rewiring).

Results
Results are presented in two stages.First, the suitability of the adopted representation from a language perspective (considering word frequency) and from a network perspective is reported.Cohort and rhyme priming effects are then analysed by using network distance and by considering specific reference null models as a comparison.

The Relevance of the Multiplex Lexical Representation.
The selected multiplex network representation of the mental lexicon is composed of layers including semantic and phonological aspects of the mental lexicon of relevance in the literature about lexical retrieval (see also Methods).However, this structure needs further validation since it must: (i) correspond to commonly used words, and also (ii) correspond to a structure that cannot be further aggregated, i.e., network layers should display different patterns of word similarities in order to further motivate the choice of considering them as separate multiplex layers.
Figure 2 reports the frequencies of words in the multiplex lexical network and in reference datasets from Opensubtitles [52].The probability of finding words with a frequency 6 Complexity higher than 10 is one order of magnitude larger in the multiplex network than in the whole Opensubtitles dataset.The multiplex lexical network is richer in terms of commonly used words compared to the language used in movies, which can also contain more specific and less frequent words (e.g., specific jargon, geographical names, etc.).Furthermore, the words in the multiplex lexical network are almost as frequent as the most frequent words in Opensubtitles (see for reference the probabilities of finding words with a frequency higher than 10 3 in Figure 2).Based on these results, the conclusion is that, in terms of word frequency, the multiplex lexical network includes commonly used words and is the representative of the most common semantic and phonological features of spoken English.The choice of keeping free associations, synonyms, generalisations, and phonological similarities as separate is supported by a structural reducibility analysis, an entropybased technique for establishing the information about network paths that is lost when layers are aggregated in a given multiplex network (see De Domenico et al. [60] for the technical details).Analogous to previous investigations with multiplex lexical networks based on other datasets [8,45], the multiplex lexical network used in the current study (cf.[9]) is irreducible.In other words, a significant number of patterns of word-word similarities could be lost in case any two or more layers of the multiplex lexical network were projected onto one layer only.The free association layer is also found to be distinct compared to generalisation, synonyms, and phonological similarities, so that it should not combine with any of these three layers.This finding confirms that the considered layers are representative of different aspects of the mental lexicon, which should be kept as distinct.
All in all, the frequency analysis indicates that the investigated multiplex lexical network is almost as rich in commonly used English words and poorer in terms of more infrequent lexical items when compared to the larger sample of words from Opensubtitle, which includes with 5•10 5 lexical items and is the representative of currently spoken English.The irreducibility analysis is another important element as it motivates the consideration of the chosen aspects of semantics and phonology through separate layers in the multiplex network.Hence, both the frequency and the structural reducibility analyses confirm the suitability of the multiplex lexical representation for investigating patterns of the mental lexicon for the English language.

The Multiplex Lexical Network Identifies Cohort Priming.
As reported in the introduction, facilitative semantic, associative, and phonological priming effects are well explained by activation spreading models over shorter network paths of word-word similarities in the mental lexicon [35,37,39,41].There is additional evidence that also inhibitory semantic priming depends on the proximity of concepts on semantic networks [61].Furthermore, as confirmed by recent studies [15,25,57], closeness is a reliable estimator of the efficiency of lexical processing; closer words on semantic and phonological networks tend to be retrieved faster and more accurately than words farther apart.7 Complexity layers and over the whole multiplex structure.Error bars indicate error margins over the median.At the significance level α = 0.05, the differences in closeness between words in cohort and words in null models are not statistically significant on the free association layer (sign test, n + = 52, p = 0 65) and on the synonyms layer (sign test, n + = 56, p = 0 23).Statistically significant differences are observed on the generalisations layer (sign test, n + = 72, p < 10 −5 ) and on the phonological layer (sign test, n + = 63, p = 0 008).A statistically significant difference is found also on the whole multiplex structure (sign test, n + = 73, p < 10 −5 ).
Words in cohorts are found to be on average closer than random expectation on specific layers, indicating the presence of a cognitive influence when processing them together and hence a priming effect.The gap observed in the phonological layer can be attributed to a tendency for words in the same cohorts to persist in the same connected component.In fact, the lower inverse network distance/closeness of the null model relates with the fragmentation of the phonological network (cf.[22]), so that words in the same cohort can have zero closeness, and this ultimately lowers the average closeness score.Therefore, despite both phonological links and cohorts being based on measures of phonological similarities, the observed gap between empirical and random average closeness of words is an indication of the clustering of cohorts over the same connected components in the multiplex lexical network.Interestingly, also cohorts in the generalisation layer are closer than random expectation.Provided that cohorts are based on word forms, this clustering over a semantic layer might be the consequence of a form-meaning correlation, a phenomenon called formmeaning nonarbitrariness and empirically traced in English and many other languages [62].The magnitude of the gaps in closeness found over the multiplex network and over the generalisations and the phonological layer do not correlate with the cohort size (Kendall Tau τ < 0 07, p values > 0.4).This analysis is directly based on the layout of hundreds of thousands of word similarities in a multiplex lexical network representative of commonly spoken language.
Notice that the difference in closeness between cohorts and random lists persists also when the phonological layer is not included in the analysis.This is an effect arising from the nonlinear combination of the shortest paths in the multiplex structure.While on individual layers (free associations and synonyms), there is no statistical difference when they are considered together with generalisations, the resulting multiplex representation displays a higher closeness for words in cohorts rather than for randomised lists (sign test, n + = 71, p = 10 −5 ).Importantly, this difference is not due to generalisations.A difference in closeness between cohorts and random expectation arises also in the multiplex network having only free associations and synonyms as network layers (sign test, n + = 69, p = 0 001).Although individual layers do not display indications of cohort priming, the multiplex lexical network structure does.Since in the model all layers except the phonological one represent semantic memory, this finding is an indication that cohort priming is not exclusively due to phonology but is present also in the combined semantic aspects of the English language.
When network links are rewired at random in configuration models (see Methods), differences in closeness vanish on all individual layers.Figure 3(b) reports the average closeness of empirical cohorts on the randomised network structure.The sign tests give the following results for the layers: n + = 52, p = 0 69 for free associations, n + = 55, p = 0 31 for synonyms, n + = 53, p = 0 55 for generalisations, and n + = 60, p = 0 05 for phonological similarities.The above results indicate that the cognitive advantage expressed by closeness [14,15,25] depends on the global structure of individual layers and not on the heterogeneity in the allocation of similarities words might have on each layer when considered individually (e.g., heterogeneity on phonological neighbourhood sizes on the phonological layer, number of associates to a word, etc.).However, even in the configuration models, words in cohorts are closer than random expectation on the whole multiplex structure (sign test, n + = 63, p = 0 009).On the whole multiplex structure, the degree heterogeneity of individual layers gets combined together, so that preserving Figure 3: (a) Mean closeness distance of words either in cohorts or in randomised lists in the layers, respectively, made of free associations (Asso.),synonyms (Syno.),generalisations (Gene.), and phonological similarities (Phon.).(b) The same as in (a) but for randomised layers of word-word similarities.8 Complexity degree correlations across layers ultimately still leads to traces of cohort priming effects.This finding indicates that degree heterogeneity determines the availability of shortcuts among words in the same cohort.It also further indicates that priming emerges from the multiplex combination of different aspects of language.

The Multiplex Lexical Network Identifies Rhyme Priming.
As with cohorts, comparisons of the average closeness of words in a rhyming class against one of the words from randomised lists are performed (see Methods).Figure 4(a) compares the median closeness of words in rhyme classes (orange bars) and in random lists (blue bars) on individual layers and over the whole multiplex structure.Error bars indicate error margins over the median.At the significance level α = 0.05, the differences in closeness between words in rhyme classes and words in null models are not statistically significant only on the free association layer (sign test, n + = 30, p = 0 26).Statistically significant differences are observed on the synonyms layer (sign test, n + = 42, p < 10 −5 ), the generalisations layer (sign test, n + = 36, p = 0 005), and the phonological layer (sign test, n + = 41, p = 10 −5 ).A statistically significant difference is found also on the whole multiplex structure (sign test, n + = 42, p < 10 −5 ).
Analogous to cohorts, words in rhyme classes are on average closer than random expectation, indicating a cognitive advantage [14,15,25] in processing them together and hence a priming effect.More in detail, this structure suggests a cognitive advantage in lexical processing, assuming cognition is driven by similar network structures and assumptions based on lexical similarity.The magnitude of the gaps in closeness between rhyme classes and random expectations do not correlate with class size (Kendall Tau τ < 0 06, p values > 0.5).
Interestingly, rhyme priming persists on one layer more than cohort priming.The layer of synonyms does not display cohort priming but features rhyme priming instead.Notice that rhyme priming persists also in the structure of semantic memory represented by free associations, synonyms, and generalisations (sign test, n + = 37, p = 0 002), again indicating that the multiplex interplay between individual aspects of language can provide evidence of priming effects that might be partially absent when these aspects are considered separately.
When network links are rewired at random in configuration models (see Methods), differences in closeness vanish on all individual layers.Figure 3(b) reports the average closeness of empirical cohorts on the randomised network structure.The sign tests give the following results for the layers: n + = 32, p = 0 09 for free associations, n + = 32, p = 0 09 for synonyms, n + = 29, p = 0 40 for generalisations, and n + = 33, p = 0 05 for phonological similarities.Even in the configuration models, words in rhyme classes are closer than random expectation on the whole multiplex structure (sign test, n + = 35, p = 0 01).Analogous to what happens with cohorts, this result indicates that degree heterogeneities of individual layers get combined together and provide shortcuts to rhyming words that still relate to rhyme priming effects.It has to be underlined that in configuration models rewiring is random but it is always constrained by degree, so that some core-periphery structure induced on the network by the degree distribution can still be present even under randomisation of links.Here, random rewiring does not disrupt shortcuts among words in the same rhyme-class.This indicates that the degree of heterogeneity in the allocation of word-word similarities and the multiplex combination of layers are both important factors for determining rhyme (and cohort) priming.
Notice that the closeness of words is lower in the phonological network compared to other network layers for both cohorts and rhyme classes.This indicates that words in cohorts/rhyme classes tend to cluster more on semantic layers rather than on the phonological layer, even though the considered groups of words are relative to phonological priming.This difference is compatible with the fact that the phonological layer includes words with an average of six phonemes, so that even words sharing on average three phonemes in their onsets or at their end might not have edit distance equal to one and hence they might not be connected with each other.Furthermore, the phonological layer is 9 Complexity significantly more disconnected than the other semantic layers (cf.[9,22]), so that the lower closeness might be due to words being in different connected components of the phonological network.Notice that if word clustering was a consequence of the definition of the phonological layer, then also randomly selected words should be clustering to a similar extent when compared to words in cohorts and rhyme classes.Instead, the presence of a closeness gap on the phonological layer indicates that words in cohorts and rhyme classes tend to belong to the same connected component of the phonological layer.

Discussion
Through the framework of multiplex lexical networks, this paper provides an elegant model to account for and predict potential cognitive advantages [14,15,25] in processing together words sharing the same onset or rhyme together.Comparison against null models indicates that these priming effects can be detected already, but not exclusively, at the structural level of word-word similarities when multiple sources of linguistic relations are integrated together rather than indirectly measured with latencies in a laboratory task.The results reported in this analysis correspond to previous work on priming in the psycholinguistic literature and open novel modelling challenges in the investigation of priming through complex networks.
First, the persistence of phonological priming patterns also outside of the phonological layer is an additional confirmation of a nonarbitrariness of language in terms of form-tomeaning correspondences [5,62] (e.g., English words sharing the onset "sn-" expressing mainly concepts related to "nose").For a given language, nonarbitrariness refers to the existence of statistical relationships between sound patterns and semantic usage of classes of words.This systematicity corresponds to facilitatory effects in terms of early word learning [62], i.e., children learning words more accurately when spotting systematic and language-specific relations between form and semantic category.The result of phonological priming effects arising also from the combination of hundreds of thousands of semantic, multiplex word-word similarities provides quantitative and large-scale evidence of a nonrandom semantic organisation of language that is influenced by phonological regularities such as onset sharing or rhyming.
It is important to underline that cohort and rhyme priming effects have long been detected and investigated in experimental psychology [31,33,38,49], although evidence for them was based only on small samples of hundreds of words being tested in memory-related tasks.The novelty of the current approach is that it is directly based on the largescale structure of hundreds of thousands of word-word similarities among thousands of commonly used English words interrelated across several semantic and phonological aspects of language.The current network approach is therefore different from an experimental setup from psycholinguistics; in that the network paradigm scales up and tests thousands of words in a considerably easier way compared to the time and effort required in working with subjects in experiments.Also, network representations rely on experiments, but once built, a network can then be used for testing a wide variety of conjectures.For instance, the same network of free associations has been used multiple times for detecting patterns of word learning [7][8][9], identifying individual creativity levels [2,3,11], or even predicting word production in clinical populations [16].The increasing adoption of complex network models in the cognitive sciences can be beneficial in terms of quantifying large-scale patterns of language usage and acquisition, mainly because of the high versatility of network models [4,6,11,17,55].It must also be underlined that network representations bear some assumptions with them and are indeed approximated representations of complex systems.For instance, the multiplex lexical network assumes that all links are weighted equally and are always present over time but this might not be the case in a structure as dynamic as the mental lexicon [5].Understanding to which extent a network approach is valuable always requires comparison with empirical evidence, often provided by smaller-scale experimental studies.A synergy between theoretical network models and experimental psycholinguistic data represents a valuable combination for future cutting-edge research, a possibility made more appealing by the recent availability of larger digital corpora and massive online psycholinguistic datasets like Opensubtitles [52].
Network approaches must work in synergy with experimental data and more specific experimental setup in order to answer the challenges revealed by network structure.An important example is the attribution of a facilitatory or inhibitory nature to the closeness gaps identified in the current investigation.In fact, a shorter distance among words in cohorts or rhyme classes could also mean higher competition levels among words and hence have an inhibitory, rather than facilitatory, effect on word processing [28,38,46].However, previous experimental studies found that cohort competition effects are stronger for larger cohorts [28,38]: the more words are activated the stronger the competition effect.This competition is present at phonological and also at semantic levels, and it leads to slower performance on lexical decision tasks.In the current investigation, both smaller (i.e., comprising 20 words) and larger (i.e., comprising 100 words) cohorts consistently displayed the same priming patterns reported in the manuscript.Differently put, words in cohorts are always closer than random expectation on the multiplex lexical structure and this gap is independent of cohort size.Since competition effects are size-dependent [28,47] while priming effects are not [46], this finding might be an important indication that the differences in the shortest path lengths found in this work represent mainly priming effects rather than lexical competition.Assessing the facilitatory or inhibitory nature of these priming patterns requires additional empirical data and represents an interesting future research direction.
Notice that cohort priming is not the only effect driving lexical retrieval.The cohort model neglects important aspects of language such as syntactic structure, which can significantly alter access to semantic memory [3,11,33].Recently, experiments from cognitive neuroscience have indicated that cohort effects and lexical competition levels are present when words are processed individually while competition is absent 10 Complexity when words are heard in short sentences [28].Also, semantic information aided the discrimination process of words in larger cohorts [47].The disappearance of cohort competition effects in sentences or in presence of semantic information indicates that word similarities and syntactic structure are both highly important in driving activation to specific target words, thus ultimately having a facilitatory, rather than inhibitory, effect on lexical retrieval.Although not fully coincident with the same richness of semantic information from sentences, the adopted multiplex representation does not consider words as disconnected units but rather provides information also about word context through similarities, e.g., the link between "play" and "act" represents the context of theatrical plays and the link between "play" and "football" represents the context of games.Hence, previous findings of context [28] and semantic word similarities [47] reducing lexico-phonological competition might represent an additional indication that the patterns found in this investigation are facilitatory rather than inhibitory.Notice also that in the relevant literature there is strong evidence for facilitatory priming to correlate positively with concept relatedness [35][36][37] and for inhibitory priming to be mainly driven by ignoring unrelated concepts [5,35,61].Combining this literature with the recent studies indicating that shorter network distance is a valid proxy for closer conceptual relatedness [15,57] further indicates that the priming effects detected on the multiplex structure are mainly facilitatory.This is in agreement also with the previous experiments specifically focused on phonological priming and indicating that cohort effects facilitate word memorisation [31,33,46] while rhyming facilitates phonological awareness, specifically in children [38].In order to fully address the nature of the patterns highlighted by the multiplex structure, a psycholinguistic experiment involving the cohort/rhyming words analysed in this investigation would be an important future research direction.By considering reaction times in a lexical decision task, it would be interesting to understand if there is any critical threshold c * of closeness above which lexical competition might overcome facilitation, e.g., lexical items being so close that they can be confused, thus inhibiting retrieval of the correct item.Another interesting research direction would be correlating closeness gaps to competition effects in cohort priming arising by interactions of specific word suffixes, which can inhibit one another [63].From a network perspective, the current investigation provides additional empirical evidence that multiplex networks can highlight phenomena that cannot be detected by single-layer networks.In fact, for both cohorts and rhyme classes, individual layers do not always display priming effects, while the multiplex network obtained by combining together these layers always highlighted statistically significant differences in terms of network distances.By assuming that these differences indicate a cognitive facilitation in processing words together, as indicated by many recent studies [15,25,57], then the above results quantitatively indicate that cohort and rhyme priming can arise from an interplay between either different aspects of semantic memory (e.g., synonyms and free associations) or by an interplay between different aspects of whole mental lexicon (e.g., phonological similarities and free associations).More in detail, assuming that lexical retrieval is influenced by a multilayer network structure of the mental lexicon, phonological priming effects might then be an emergent property of the adopted multiplex representation of the lexicon as it arises from the multiple interactions among words across different aspects of language.
Notice that the gap in closeness between cohorts/rhyme classes and random expectation in the empirical multiplex network is almost an order of magnitude larger than in the randomly rewired multiplex network, containing random links between words.This indicates that the detected gaps in closeness are mainly due to the empirical structure of word-word similarities in the real layers rather than to the act of combining layers, instead.Notice also that the current investigation cannot provide any causality link, since the structure itself is unable to fully identify the nature of the priming patterns found in the literature, as these patterns are heavily influenced by other aspects of lexical retrieval such as attention [35], modality [28,36], and timing between prime and target [37].Addressing through experiments the challenges opened by the current multiplex network investigation on priming would also require a more thorough investigation of the factors influencing priming beyond the mental lexicon structure, such as different stimulus onset asynchrony determining the strength of positive priming [35,37] or different modalities affecting the extent of negative semantic priming [35].This rich variety of priming patterns underlines the importance of further multilayer modelling efforts for the understanding of priming effects in languagerelated tasks.
One limitation of the original cohort model was that it neglected the influence that semantics exerts over lexical retrieval in perceptual tasks [31,41], an element that is taken into account in more refined models of word processing [41,43] and confirmed also by experimental studies [28].Interestingly, the fact that free associations displayed a significant gap in closeness for rhyming but not for cohorts might be a consequence of the different positions of phonemes.Relying on the last phonemes would allow for a temporal unfolding to occur, during which the first part of the word would be acquired and some of its semantic features would be available for processing, features that cannot be available when the first phonemes are heard instead.This difference reconciles the finding that in rhyme priming there is a closeness gap also in free associations, a gap that is absent when cohorts are considered.This quantitative difference indicates that rhyme priming is more heavily influenced by semantic information compared to cohort priming.
A limitation of the multiplex approach is that it does not consider individual variability.It is expected for lexical retrieval to be influenced also by individual factors such as fluid intelligence or other active cognitive search strategies [28,42,43].Even creativity levels have been recently shown to deeply influence lexical retrieval and word identification in healthy populations [3,11,55].One possibility for overcoming this limitation could be the substitution of the layer of free associations with other empirical layers, always of free associations but obtained from subjects 11 Complexity belonging to a specific population, like for instance highly creative people.Previous research has shown that more creative people tend to associate even semantically unrelated concepts [3,11,55], so that new shortcuts might appear in the free association layer.These paths might alter the results found in the current investigation for normative subjects.Considering other ad hoc layers of free associations could also be a valuable research direction for generalising the model in order to incorporate ageing.Recent work has shown that over time the mental lexicon undergoes some substantial changes and some word-word similarities get lost [19], thus potentially altering the shortcuts connecting words in cohorts or rhyme classes.A reduction of priming effects with age is expected, particularly the one due to rhyming which has been empirically shown to decrease in strength from childhood to adulthood [38].
Also, the investigation of clinical populations could be interesting for future research [17,24].In case the shortcuts allowing for cohort and rhyme classes were resilient to progressive word failure in people with aphasia, these wordword associations might be used for designing strategies of intervention for restoring or mending the functionality of the mental lexicon.The framework of multiplex lexical networks has been already applied to clinical populations with aphasia [16], and it showed that word production in subjects with aphasia crucially depends on the closeness that words have over the multiplex lexical structure.Words with higher closeness centrality were easier to pronounce in picture naming tasks compared to words with lower closeness.Investigating potential differences between words in cohorts/rhyme classes and specific null models would represent an interesting research direction.
All in all, multiplex lexical networks represent a powerful framework for the quantitative investigation of psycholinguistic patterns where the interplay between different semantic and phonological aspects of language is relevant.The multiplex structure of these linguistic networks opens new important challenges for the large-scale understanding of the cognitive processes driving language usage.

Figure 1 :
Figure1: Network visualisation of a portion of the adopted multiplex lexical network.The whole multiplex representation contains 8546 words.Semantic layers are clustered together (free associations, synonyms, and generalisation) and represent multiple aspects of semantic memory.Phonological information is represented as a network of phonological similarities, where words differing by one phoneme were linked together.The resulting multiplex lexical network is an edge-coloured graph where links of different types coexist (see right panel).

Figure 3 (Figure 2 :
Figure2: Cumulative probability distribution P(X ≥ f) of finding a word with frequency at least f in the multiplex lexical network with N = 8531 words (blue dots), in the Opensubtitles dataset with 5•10 5 unique words (orange squares), and in the subset of N = 8531 most frequent words from Opensubtitles (green diamonds).A power-law with exponent −1.8 is reported for visual comparison (dashed line).

Figure 4 :
Figure4: (a) Mean closeness distance of words either in cohorts or in randomised lists in the layers, respectively, made of free associations (Asso.),synonyms (Syno.),generalisations (Gene.), and phonological similarities (Phon.).(b) The same as in (a) but for randomised layers of word-word similarities.