Navigating Concepts in the Human Mind Unravels the Latent Geometry of Its Semantic Space

Department of Information Engineering and Computer Science, University of Trento, Via Sommarive 9, 38123 Povo, Trento, Italy CoMuNe Lab, Fondazione Bruno Kessler, Via Sommarive 18, 38123 Povo, Trento, Italy Department of Mental Health, Division of Psychology, Azienda Provinciale per i Servizi Sanitari, Viale Verona 38123, Trento, Italy DPCS, Fondazione Bruno Kessler, Via Sommarive 18, 38123 Povo, Trento, Italy


Introduction
e retrieval of basic knowledge from memory, known as semantic memory [1], has long been the focus of a lively debate across multiple research fields. Such a debate mainly divides between two schools of thought, i.e., the one of semantic space and the one of semantic networks. According to the branch of semantic space, the search is a key cognitive feature that operates similarly across different scales and contexts [2]. In many domains, the search at different scale (from the search for an object in a bag to the search for a disease cure) always requires to manage the trade-off between exploiting what is known and exploring what is unknown [2]. In this sense, the internal search of memory retrieval exhibits similar characteristics to the external search in physical space [3]. According to the theory of optimal foraging [3], the process of retrieving concepts from memory is dynamically similar to the one performed by animals when searching for food between patches of their environment [4]. is mental dynamical process mediates between local exploitation of clusters of information and global exploration of such cluster, pursuing a sort of semantic foraging [3]. In accordance with the marginal value theorem [5], the semantic memory search is considered optimal if the subject, as the animal does in the optimal foraging, leaves a given cluster of information when the benefit of local exploitation falls into the level of the expected benefit of changing cluster and searching elsewhere [3]. In the clinical field, the intuition that patients cognitively organized the semantic access around semantic clusters following a clustering and switching pattern in search has been widely used to investigate the semantic retrieval [6] and the semantic impairment [7]. However, in this method, clusters are based on hand-made classification according to taxonomies: limitations can be partially overcome by taking advantage of distributional semantics to define the clustering and chaining of concepts during a semantic memory retrieval task [8]. In summary, according to the semantic space school of thought, the modelling of searching in semantic memory needs two main ingredients: (a) a structural representation of the search space (hand-coded or statistically derived) and (b) a model of the search process (e.g., local to global transitions) [4]. However, there is still no clear definition of what a patch is and how to define it in memory [4]. Concurrently, another school of thought, the one of semantic network, demonstrated that the same results obtained with the optimal foraging in semantic space could emerge from a random walk exploring a semantic network [9,10]. Instead of the clustering and switching processes, this network approach was postulating a simpler and single process of exploration on a network of concepts. According to this approach, the navigation of concepts is represented by associative semantic networks [11], fostering the idea that concepts are cognitive units, each represented as a node linked to associated elements [12][13][14][15]. A typical issue often leveled against the use of semantic networks is that they might end up explaining, or predicting, memory retrieval by leveraging on models built from similar behaviors, for example, when modelling semantic networks from free associations data to explain semantic fluency tasks [16]. Progress in building such networks from fluency data has been made [17]; however, there is still no consensus about the most appropriate way to construct semantic networks [18]. Nevertheless, the semantic network approach has been widely used in the clinical field for the assessment of psychosis [19], Alzheimer's disease [20][21][22], and in cognitive science, for example, to investigate the levels of creativity [23] and the openness to experience in the human beings [24]. Over the past two decades, vector-space models of words meaning as high-dimensional numerical vectors have become serious contenders of semantic representation [25], for example, when studying human psycholinguistic tasks [26] or when exploring the semantic verbal fluency in mild cognitive disorder [8]. Powerful tools involving this kind of spatial representation of words are the so-called word embeddings, bridging distributional semantics and natural language processing, which map words into vectors in a multidimensional space [27]. e underling idea of this approach can be summarized with the words of the English linguist J. R. Firth, "a word is characterized by the company it keeps" that, from a mathematical perspective, means that the closer the words in the multidimensional space, the closer their meaning in the vocabulary.
Language and semantic memory retrieval tasks are crucial in the identification of neurodegenerative diseases [28][29][30] and are usually employed in different neuropsychological tests. Among these tests, the categorical semantic verbal fluencies (SVF) play an important role in the assessment of dementia and Alzheimer's disease in particular [6,31]. Here, patients are asked to pronounce as many words as possible, belonging to a certain category, within a given time interval. Patients' performance is successively evaluated, in particular by counting the amount of words pronounced [32] or their response times [33]. To investigate semantic retrieval, further approaches, based on the intuition that patients cognitively organized the semantic access around semantic clusters, have been widely used [6]. Recently, evidence of semantic maps tiling human cerebral cortex has been provided from fMRI data, probing the existence of semantic selectivity in brain areas [34] and further strengthening the insight that the language can be organized on a topological space, i.e., on a manifold. Nevertheless, a clear understanding of the mechanisms behind the navigation of semantic memory still eludes us [35,36]. e aim of this work is to provide a data-driven insight on why the hypothesis of a spatial representation, i.e., of an underlying, latent geometry characterizing the human mind, is plausible. Relying on clinical data of semantic verbal fluency test from 215 subjects and leveraging on wordembeddings tool, we aim at defining suitable metrics to indirectly explore this possible, latent geometry. Our work arises within the debate between semantic space and semantic network representations, and it examines the exploration process, integrating both perspectives, space and networks, according to an approach that builds upon the geometry at the concept scale and culminates with diagnosis-based semantic networks, passing through the mesoscale organization of concepts (clustering). Remarkably, our framework allows us to gain new insights into the organization of concepts in the human mind and shed some light on why some existing approaches were successful. In fact, the mechanisms behind the retrieval of basic knowledge, known as semantic memory [1], still remain fundamentally unknown [37]. Here, we fill this gap by (i) hypothesizing the existence of an underlying geometry, which governs the exploration of concepts in the human mind, and (ii) demonstrating that such a geometry can discriminate between healthy subjects and patients at different stages of dementia. Our hypotheses are based on the assumption that if a common latent geometry underlying the mental navigation of concepts existed, then subjects with semantic retrieval deficits should show some distortion in such a navigation on top of this geometry. Here, we observe how different population of subjects, in terms of semantic impairment, differently navigates the same geometry by means of suitable metrics characterizing their explorations. If our hypothesis is reasonable, we predict to see significant differences in metrics computed from different diagnoses.
Our study is based on the analysis of semantic verbal fluencies (SVF) data, belonging to animal category, from 92 patients suffering of dementia (DEM, M � 40%, F � 60%, age � 75 ± 7, yrs of education � 9 ± 4), 93 patients suffering of Mild Cognitive Impairment, a precursor of Alzheimer's disease (MCI, M � 48%, F � 52%, age � 77 ± 6, yrs of education � 9 ± 4), and 30 healthy controls (CTR, M � 60%, F � 40%, age � 32 ± 7, yrs of education � 17 ± 0.40). During the semantic verbal fluency test, each individual is asked to report all words he/she can remember belonging to 2 Complexity category of animals, within a time interval of 60 seconds. Each spoken word is annotated by the neuropsychologist who is testing the patient. No clues nor incentives are given to the subjects during the tests, and any repetitions are not marked. e SVF test is a significant test for the assessment of dementia diagnosis [31]. Generally, the semantic impairment is more severe in patients with dementia than MCI patients. e rationale for looking at the semantic fluencies of these two populations aims at testing our guess that if a latent geometry existed, a different severity in semantic memory retrieval impairment should be reflected in a different way of navigating concepts on such a plausible, latent geometry. MCI subjects have an increased risk of conversion to Alzheimer's disease (and dementia in general). Possibly, testing the differences between the exploration of concepts of these two populations and a group of control by means of spatial metrics might be relevant to get insight into how the navigation of different category of subjects is performed from a data-driven perspective. Instead of focusing only on statistical descriptors of language, e.g., word frequency or vocabulary size, in disease, we considered also in which sequence the words have been provided; this information is crucial, because it allows to map the navigation of concepts in the underlying semantic space.
To characterize the navigability of this space in terms of concepts visited in such unknown, possibly multidimensional space, we first had to build a plausible geometric proxy (illustrative representation in Figure 1).
To this aim, we used three distinct word embeddings obtained from the Italian language, namely, Italian Word Embeddings, trained on the Italian Wikipedia [38], itWac, constructed from the Web limiting the crawl to the .it domain and using medium-frequency words from the Repubblica corpus and basic Italian vocabulary lists as seeds [39], and Twitter, trained on 46.935.207 tweets [39]; all the word embeddings were generated with the popular word representation models, word2vec [27]. By choosing three different word embeddings, we are able to evaluate the robustness of our metrics in geometries coming from different sources, i.e., a website (Wikipedia), a social network (Twitter), and a newspaper (La Repubblica). In the following, we will refer to word embeddings, semantic spaces, or geometries interchangeably. Here, the term geometry is justified by the fact that we leveraged on word-embeddings, powerful tools that encode the semantic relation between the words as a geometric relationship between vectors in multidimensional space. Word embeddings are built from data corresponding to humans-written documents (in our case, Wikipedia, Twitter, and La Repubblica, an Italian newspaper), which are then embedded in a multidimensional space according to the hypothesis of the distributional semantics. is hypothesis defines the semantics similarity in terms of vector similarity; i.e., the closer the meaning in the vocabulary, the closer the points representing the words in the word embeddings (encoded by a vector of coordinates in a multidimensional space). In this sense, by embedding the words pronounced by a sample of subjects into a word embedding, which is a coordinate space by design, we can study the mental navigation of such subjects on a geometry of concepts. For each group of subjects, we, therefore, have three independent semantic spaces; each one is used to characterize the local exploration and the overall navigation of the semantic geometry. More specifically, we introduce five different descriptors for this purpose, in order to identify the effects of the underlying geometry, if any. At the smallest scale, i.e., the one of single concepts, geometry is probed in terms of: (1) Maximum jump MaxJ, i.e., the maximum distance, in the word embeddings, between two consequent words pronounced during the test, it defines the maximum instantaneous capacity to change context; (2) Diameter of exploration DOE, i.e., the maximum distance, in the word embeddings, between the words pronounced during the test, whatever the order, it defines the maximum capacity to change context in the whole test duration. To be consistent, we call this metric DOE, when it is computed with the Euclidean distance, and amplitude of exploration (AOE), when it is computed with the cosine distance; (3) Density of exploration ρ w , it corresponds to the total amount of animal words potentially available in the hypersphere built from the exploration radius R, half the DOE, and which has as its center the centroid C of spoken words in the word embeddings. It returns a measure of density in the volume of words explored by the subject. Specifically, it defines the density of pertaining words (i.e., belonging to the category of animal) in the area explored by the subject in the geometry; (4) Distance d, it is the total distance covered during the test; it quantifies the magnitude of the overall exploration; (5) And farness far, i.e., the average distance of the words pronounced, it defines the ability to go far with a certain number of jumps.
For mathematical details about each descriptor, we refer to section Methods, while their significance in discriminating the three groups of subjects is evaluated by means of Kolmogorov-Smirnov statistical tests and t-tests.
Afterwards, in line with the idea that the semantic access is cognitively organized around semantic clusters [3,6], we probed the mesoscale organization of concepts by performing a semisupervised clustering algorithm in the three geometries. Accordingly, we define the explorative potential of the navigation for each category as the total number of visited clusters and as the total number of words included in the visited clusters. is descriptor is a proxy for the cognitive effort spendable during the navigation, and it defines the total amount of clusters/words, potentially visitable/ retrievable during the test. Clusters are then given as input to a hierarchical clustering algorithm, which provides the spatial hierarchy of such clusters based on their relative distance. By comparing the distances between the visited clusters, we are able to evaluate the existence of a hierarchy in the way subjects explore concepts (technical details about clustering and explorative potential can be found in section Methods).

Complexity
Taking inspiration from the process of clustering and switching when retrieving concepts from memory, network scientists provided a new kind of random walk over a graph as a Markov process, i.e., the switcher random walk [40], to generalize the exploration task on a network. In this vein, and by following the assumption of the semantic network navigated by a random walk [10], we finally tested the navigation of concepts by means of their Markov representation, to probe the possible alterations of mental pathways emerging from the exploration of concepts in patients with dementia. Mathematically, this corresponds to defining the transition probability from one state (i.e., cluster) to another, regardless of the previously visited states. Operationally, we build three Markov chains, one for each group, i.e. the two diagnosis and the healthy control, considering all the clusters visited by each group as the states of the Markov chain of that group, and setting the transition probabilities equal to the relative transition frequencies from one state to another in each group. Each Markov chain is characterized by the steady state distribution and the mean first passage time matrix. In the following, we provide the intuitions on how to interpret these two descriptors for each network of concepts. e intuitions behind the steady state distribution and the mean first passage time (MFPT) matrix are given by the purpose to investigate and to characterize the search process pursued by each diagnosis on its network of concepts. Being the mathematical model of the network of concepts assumed as a Markov chain, the steady state distribution and the MFPT are the key descriptors to investigate such a navigation dynamic. We assume that if it is true that different diagnoses explore the network of concepts in different way, the steady state distributions and the MFPT should highlight these differences. In fact, the steady state distribution defines the unique distribution to which the exploration converges as the number of transitions increases, regardless of a Markov chain's initial state. Here, the steady state is a vector, computed for each category, representing the probabilities to be in each of the cluster of words visited by that category, after a sufficient amount of time. It is to be noticed that each subject has one minute to complete the SVF test, but practically no patient uses it all because he finishes the words before the one minute ends. In this sense, one minute is enough to reach a regime situation, which is mathematically represented through the steady state distribution. However, it would be experimentally impossible to test a subject for an infinite amount of time. Our intention is to compare metrics that uniquely identify the pattern of exploration, as given by steady state and MFPT, for each category of subject; in this way, we can detect any possible differences between such patterns. For what concerns the intuition behind the MFPT matrix, it encodes the mean amount of time required to go from one state i to another state j of a Markov chain. In our case, the MFPTmatrices encode the mean number of transitions to go from one cluster of words to another. Specifically, we define a MFPT matrix for each diagnosis. e entries of such a MFPT matrix answer the question: starting form one cluster of words i, how long does it take, on average, for this specific category to reach a specific cluster of words j for the first time? In this sense, the MFPT matrix characterizes the exploration dynamic of each diagnosis since it returns an average measure of the time spent to navigate the underlying network of concepts. In summary, MFPT matrix defines the average number of steps needed to reach a certain state from another for the first time. is idea, redefined on the network of concepts, corresponds to the average time required for each diagnosis to pass from one cluster to another for the first time and then enable us to measure the time needed to travel for the first time a certain mental link connecting two groups of concepts. Taken in isolation, the steady state distribution (ss) and the MFPT can give us an insight of how the exploration of each diagnosis evolves on the network and On the left, navigation of concepts on the semantic space, arrows, defines the sequence of words. On the right, a zoom outlining the navigation, w n , is the concepts, w x their centroid, and R is the radius of exploration.
is figure has been generated using the 3D Design Software SketchUp 2020.
4 Complexity over time. For example, they provide us how heterogeneously the clusters will be explored after a sufficient number of transitions (ss) and how much time it takes before a cluster is visited for the first time (MFPT). To investigate possible differences in the dynamic of exploration between the diagnoses and the healthy control group, these descriptors are then compared by means of similarity measures between the three groups, i.e., Pearson correlation, Spearman's correlation, Euclidean norm, Frobenius norm, and covariance (for mathematical details about Markov chains, we refer the reader to section Methods).

Geometry.
Overall, the metrics defined to characterize the local exploration prove to be suitable for discriminating between healthy and nonhealthy subjects in all the three spaces. e results for the three semantic space are shown in Figure 2. Specifically, according to the results of t-tests all the metrics in all geometries, except for far in twitter geometry, are able to discriminate between healthy and nonhealthy subjects, all having p values ≤0.0104 (see Table 2 of Supplementary Materials for detailed results of t-tests). Also, according to the results of Kolmogorov-Smirnov statistical test, all the metrics, except for Max j in itWaC geometry and far in twitter geometry, reveal to be able to discern between healthy and nonhealthy subjects all having p values ≤0.0304 (see Table 1 of Supplementary Materials for detailed results of Kolmogorov-Smirnov statistical test).
Remarkably, the distance d is always significant not only in discerning between healthy and nonhealthy subjects, but also between different stages of dementia according to both Kolmogorov-Smirnov and t-test (p values of KS test in all the geometry are ≤0.0209, p values of t-test in all the geometry are ≤0.0076). Interestingly, in the Wikipedia geometry, KS test highlights that all the metrics turn out to be significant (all having p values ≤0.0233), except for far and Max j , in distinguishing between all the three categories, i.e., healthy controls and the two stages of dementia MCI and DEM. Also for the t-tests, the metrics turn out to be significant (all having p values ≤0.017867), except for far, in separating the three categories. Results on local exploration can be summarized as follows: (i) All the metrics can be used in all the geometries to separate between healthy and nonhealthy, except for Max j in itWaC geometry and farin twitter geometry; (ii) All the metrics should be used only in the Wikipedia geometry, excluding the far and Max j , to discriminate between the three categories (DEM, MCI, and healthy); (iii) e distance d metric is robust in separating all the three categories across the three word embeddings and should be used when considering the itWaC and the Twitter geometry to discriminate between different stages of dementia.
Detailed results of Kolmogorov-Smirnov statistical tests and t-test for each metric are reported in Tables 1 and 2 of Supplementary Materials.

Hierarchy.
e explorative potential is able to discriminate between healthy and nonhealthy subjects according to both KS tests and t-tests (p values ≤ 0.002 6, see Tables 3 and 4 of Supplementary Materials for detailed  results), strengthening what we found at the local scale. Figure 3 shows the tanglegrams for the pair MCI-DEM for the three semantic spaces; it shows as well the values of Baker's Gamma correlation [41] compared with the null models for all the pairs. What is clear in this analysis is the strong correlation between MCI and DEM in the hierarchy through which the concepts are explored, as evidenced by the values of Baker's correlation, equal to 0.88 in itWac, 0.97 in Twitter, and 0.73 in Wikipedia, and as validated by the null model, in contrast with the correlation for all other pairs. is is notably remarkable when compared with the values of correlation between the stages of dementia and healthy controls, which, instead, are always close to zero in the three geometries (two-dash lines in Figure 3).

Networks.
e Markov chains modelling the exploration of concepts are displayed in Figure 4 and can be considered as a proxy of the semantic networks for each group of subjects. e numbers of nodes of such networks vary within the same group according to the spaces because of clustering mapping (itWaC: CTR 19, MCI 13, DEM 11; Twitter: CTR 16, MCI 12, DEM 9; Wikipedia: CTR 32, MCI 25, DEM 28). Overall, nonhealthy patients explore a smaller portion of the semantic nodes with respect to healthy control. It is to be noticed that, for the geometries of itWaC and Twitter, there is a progressive decrease in the number of visited nodes going from CTR to MCI and from MCI to DEM. Not all the considered correlation measures between the steady states and the mean first passage time matrices agree in ranking the similarity between the analyzed groups, and only some specific combinations of geometry-correlation measure highlight higher correlation for the couple MCI-DEM. In particular, this is true for the values of Spearman correlation in itWaC and Twitter geometries and for the ones of Pearson correlation in Wikipedia (specific results are reported in Tables 5-7

Discussion
In this work, we investigated the assumption related to the semantic space by testing how plausible is the hypothesis of a  6  6  5  5  4  4  3  3  2  2  1  1 0   42  91  69  98  8  45  23  52  62  51  22  26  57  87  38  17  94  95  58  40  15  7  4  60  29  97  75  61  66  89  44  33  80  56  72  6  46  43  14   69  98  57  91  51  62  23  42  52  8  87  38  17  94  40  66  15  95  58  44  89  4  61  60  29  97  75  33  45  7  22  26  80  56  72  6   6 Complexity latent geometry underlying exploration of concepts in human mind, and whether this geometry can be used to discriminate between healthy and nonhealthy subjects. Our hypothesis is tested against different types of navigation, i.e., the one coming from healthy subjects and the one coming from subjects with deficit in semantic memory retrieval task according to a prior clinical evaluation. By means of suitable metrics characterizing the spatial navigation of concepts on three distinct word embeddings, we have demonstrated why is being plausible, that the mental navigation process takes place on a latent geometry, understood as an organized manifold of lexical information, by relying on data coming from 215 semantic verbal fluency tests. In terms of this, the geometry of the word embeddings acted as a proxy of a potential geometry of the human mind, intended as the setting where the information is somehow organized, when navigating the concepts. We examined the exploration process, integrating two main perspectives, space and networks, according to an approach that builds upon the geometry at the concept scale and culminates with diagnosis-based semantic networks, passing through the mesoscale organization of concepts. On the one hand, semantic networks do not give satisfactory strength of evidence in discerning between the groups of subjects considered (CTR, MCI, and DEM). In fact, results vary according to word embeddings and correlation measures, consequently proving to be an indicative but not definitive approach. On the other hand, the geometric approach gives significant results in revealing differences between healthy and nonhealthy subjects through local descriptors, and in highlighting the similarities between Mild Cognitive Impairment and patients with dementia through hierarchy. It is to be noticed that the metrics distance d is always significant not only in discerning between healthy and nonhealthy subjects, but also between different stages of dementia in all the three geometry. Intriguingly, in the Wikipedia geometry, all the metrics, except for farness and maximum jump, are able to separate all the three categories. In short, if we had to choose one metric that can separate the three categories (DEM, MCI, and healthy controls), whatever the geometry is, we would choose the distance d. Otherwise, we had to choose a semantic space that can capture the differences between the three categories in all the considered metrics (excluding the farness and the maximum jump), and we would declare the Wikipedia geometry as the chosen one for this task. Finally, it is always possible to distinguish between healthy and nonhealthy in all the geometries whatever the metric is, except for Max j in itWaC geometry and far in twitter geometry (this is demonstrated by the results of Kolmogorov-Smirnov tests and ttests, reported in table from 1 to 4 in the Supplementary Materials). Our results suggest how the metrics, coupled with word embedding, should be chosen according to the purpose (i.e., discriminate between healthy and non-healthy and/or discriminate between all the three considered categories, DEM, MCI, and healthy). It is worth noting that the . Networks of concepts as reconstructed from semantic verbal fluency tests, for the three semantic spaces. Colored nodes encode clusters of concepts reported by patients while performing the test where they are asked to report words belonging to animal category. e size of nodes is proportional to the nodes' strength, while the thickness of the edges is proportional to their weight. is figure has been generated using the publicly available R software, version 3.6.3.
Wikipedia word embedding is a multidimensional space of 300 dimensions, that is, more than double compared to the other word embeddings used in this study (itWaC and Twitter), which have 128 dimensions. is means that, to some extent, the Wikipedia geometry contains more information encoded in the relationship between words. us, it could be possible that all metrics computed in the Wikipedia geometry can discriminate between all the three categories precisely because of this higher information stored in this word embedding. We conclude that the geometric framework is an effective and robust approach to investigate the semantic memory retrieval and to assess its abnormal navigation in patients at different stages of dementia. For this reason, our metrics could be used in support of the clinical assessment as a data-driven tool for confirming, and not yet predicting, the diagnosis. is would help planning the longitudinal referral, for example, by establishing a six-month visit interval for DEM patients and a one-year interval for MCI patients, avoiding stressing the latter ones in close visits. Our investigation represents the very first step to provide a new data-driven framework to eventually predict the diagnoses from fluency data when much of such clinical data will be available. In this regard, a Bayesian mixed effects model would be a powerful tool to get a grounded and much informative inference on the relationship between different key variables, such as the diagnosis label, the population class demographic (age, sex, and education), the semantic space (itWaC, twitter, and Wikipedia) and the value of the metrics in each semantic space. In addition, further development of this work should include a cohort of elderly healthy controls. Moreover, knowing if a patient is more performing in the density of explored concepts, many words of similar meaning, or in changing context,MaxJ, DOE, could help develop future targeted cognitive stimulation based on the value of such metrics. Cognitive stimulation [42,43] is useful in preventing patients from abusing pharmacological therapy in favor of personalized and more targeted exercises for the maintenance of residual capacities. In other words, improving our understanding of memory retrieval task and impaired cognitive search could considerably improve the life quality of people with dementia, often prone to develop secondary diseases, such as depression [44], related to the inability to express or recall concepts. Finally, given the robustness of our results in separating healthy and nonhealthy subjects, the geometric approach could be wisely used to develop digital pretriage tools. In this way, by means of the metrics proposed in our work, patients could be divided in the two macro categories healthy and nonhealthy before the clinical examination. is would be of tremendous help in avoiding unnecessary visits to healthcare facilities. Our goal might seem ambitious and definitely challenging but maybe not so unrealistic considering the historical moment we are living in due to COVID-19. In fact, preventing most susceptible subjects to the risk of pandemic, such as those elderly people suspected of dementia, from unnecessarily going to healthcare facilities could considerably safeguard their lives.

Dataset.
e dataset we relied on consists of semantic verbal fluencies (SVF) test records of 185 patients and 30 of healthy controls (CTR). Among them 92 patients suffer of dementia (DEM), which includes vascular dementia, frontotemporal dementia, degenerative dementia, and Alzheimer's disease, while 93 suffer of Mild Cognitive Impairment (MCI), a precursor of Alzheimer's disease. e SVF records report the sequence of Italian words, belonging to the category of animals, spoken by each patient and control subject during the test. Our work is a retrospective study of data previously collected by the Department of Mental Health, Division of Psychology, Azienda Provinciale per i Servizi Sanitari, in Trento, Italy. All the data was collected in accordance with relevant guidelines and regulations with participants' written informed consent. DEM and MCI diagnoses were made as well at Azienda Provinciale per i Servizi Sanitari of Trento, Italy, by consensus of medical specialists as geriatricians, neurologists, or psychiatrists on the basis of physiological, instrumental, and test Cornell scale for depression in dementia, Activities of daily living, and Instrumental activities of daily living). e fact of grouping together different type of dementia conditions is motivated by the small number of samples related to each dementia category we can rely on. By grouping together all dementias, we obtain a sample that is comparable with that of MCI. Anyway, for our assessment, we rely on official and specialist sources, which report: "the boundaries between different forms of dementia are indistinct and mixed forms often co-exist" (WHO, https://www.who.int/news-room/ fact-sheets/detail/dementia).
e Semantic Verbal Fluency (SVF) tests were conducted at the Department of Mental Health, Division of Psychology, Azienda Provinciale per i Servizi Sanitari, in Trento, Italy, following a specific clinical protocol. In particular, the neuropsychologist asked each individual to report all words he/she can remember belonging to the category of animals, within a time interval of 60 seconds. No clues nor incentives are given to the subjects during the tests. As soon as the patient pronounces a word, the neuropsychologist takes note by hand of the spoken word, the neuropsychologist also notes the order in which the words are pronounced, and any repetitions are not marked.

Semantic Space.
To define the semantic space, we leveraged on the powerful tool of word embeddings as plausible geometric proxy of such a space. Particularly, we have used three distinct word embeddings, obtained from the Italian  [39].
In order to get the number of animal words potentially available in the hypersphere built from the exploration radius R of spoken words for each subject, we translated into Italian the list of animals made by Greg Borenstein, available on GitHub at https://gist.github.com/atduskgreg/ 3cf8ef48cb0d29cf151bedad81553a54.
is is used to compute the density of exploration ρ w , as specified in the next paragraph.

Geometry.
At the scale of single concepts, we provided five different indicators useful to characterize the local exploration of concepts and eventually to discriminate between healthy and nonhealthy subjects. Each subject p speaks an amount of words N during the SVF test, we call this set of words W p � w i with 0 ≤ i ≤ N . For each patient and for the healthy controls, we define the following metrics: (1) Maximum jump MaxJ, it is the maximum distance, in the semantic space, between two consequent words pronounced during the test. It defines the maximum instantaneous capacity to change context as follows: where dist can be both Euclidean distance and cosine distance. Results are presented (in section Results) considering the cosine distance for all the metrics. (2) Diameter of exploration DOE, it is the maximum distance, in the word embeddings, between the words pronounced during the test, whatever the order is; it defines the maximum capacity to change context in the whole test duration as follows: According to the measure of distance, i.e., Euclidean or cosine distance, this metric is defined, respectively, as Diameter of exploration (DOE) or Amplitude of exploration (AOE). (3) Density of exploration ρ w , it corresponds to the total amount of animal words potentially available in the hypersphere built from the exploration radius R, half the DOE, which has as its center the centroid C of spoken words in the semantic space as follows: where C p is the centroid of spoken words by patient p, for each coordinates x of the semantic space, and it is defined as follows: According to the dimension of the word embedding, the centroid will have 300 or 128 dimensions, while A is the complete set of animals in the word embeddings, and R is the radius of exploration, i.e., half the DOE. (4) Distance d, it is the total distance covered during the test as follows: (5) Farness far, it is the average distance of the words pronounced; it defines the ability to go far with a certain number of jumps as follows: e significance of the above defined indicators in discriminating the three groups of subjects is evaluated by means of Kolmogorov-Smirnov statistical test and t-test, with a 95 % confidence interval (detailed results available in Tables 1 and 2 of Supplementary Materials). It is to be noticed that we are testing if each metric can separate between the three categories DEM, MCI, and healthy controls (in each word embedding). For this reason, being in a case of multiple testing, we have adjusted the p values of each performed test (Kolmogorov-Smirnov and t-test) according to the Holm-Bonferroni method.

Hierarchy.
For each geometry, we provided its mesoscale organization of concepts by performing a semisupervised clustering, using the linear algorithm of k-means, and setting the number of clusters accordingly to the elbow method (see Figure 5). Relying on these clustering configurations, we defined the explorative potential of the navigation as the total number of visited clusters and as the total number of words included in the visited clusters, for each subject. ese descriptors report the total amount of clusters potentially visitable during the test and the total number of words potentially retrievable during the test. Also, for geometric indicators of previous section, even in this case, the significance of the two explorative potential metrics in discriminating the three groups of subjects has been evaluated by means of Kolmogorov-Smirnov statistical test and t-test, with a 95% confidence interval (resulting values available in Tables 2 and 3

of Supplementary Materials) and
Complexity by adjusting p values according to the Holm-Bonferroni method. By performing a clustering analysis between the embedded visited clusters, we are able to evaluate the existence of a hierarchy in the way subjects explore concepts.
Once the clustering configuration for each geometry has been obtained, it is possible to extract the hierarchical configuration of these clusters, thanks to the coordinates of the centroids of each cluster in the geometries. In other words, the clusters are the same of k-means output, and given the position of each cluster in the geometry (identified by the centroid), it is possible to define a spatial hierarchical relationship between such clusters, in terms of distances between centroids.
Particularly, we computed the clusters' distance metric for each group setting the distance between not visited clusters equal to the double of the maximum distance between visited clusters (i.e., a proxy to infinite), in this way, we assure that the not visited clusters will not be relevant in the hierarchical analysis of that group. With the distance matrices so computed, we performed a hierarchical clustering algorithm to discover the relationship between clusters. Hierarchical relationship among clusters visited by a group is shown through dendrograms, while differences between hierarchies, i.e., in the way different groups explore concepts, are displayed through tanglegrams Figure 3. Finally, we investigated the correlation between the three groups by computing Baker's Gamma correlation coefficient in pairs for the three groups' trees (dendrograms) and by testing it against a null model. For a better understanding of the clustering analysis, we summarized below what we have done in two main steps: (1) Clustering configuration: We defined the mesoscale organization of concepts; i.e., we identify the clusters of concepts for each category (DEM, MCI, and healthy control) and for each geometry by means of linear semisupervised clustering algorithm (kmeans). ese clusters represent how the expressed concepts grouped together on a semantic space and they will constitute the states of the Markov chains. rough k-means clustering, we also provided the explorative potential which defines the total amount of clusters explored by each category (see Figure 1 of Supplementary Materials).
(2) Hierarchical configuration of clusters: e clusters identified by the k-means algorithm are then given as input to a hierarchical clustering algorithm, which provides the spatial hierarchy of such clusters based on their relative distance. Intuitively, since the three categories explore different clusters, the study of the hierarchy gives us an insight into the way such clusters are explored. In order to detect any possible difference in the hierarchy, we compute the values of Baker's gamma correlation, a measure of similarity between two trees (dendrogram) of hierarchical clustering (see Figure 3).

4.5.
Networks. At the macroscale, i.e., the scale of clusters of concepts, the navigation of concepts is tested by means of its Markov representation, to probe the possible alterations of mental pathways emerging from the exploration of concepts in patients with dementia. Mathematically, this corresponds to defining the transition probability from one state (i.e., cluster) to another, regardless of previously visited states. Operatively, we build three Markov chains, one for each group g, i.e. the two diagnosis and the healthy control, considering all the clusters visited by each group as the states of the Markov chain of that group, and setting the transition probabilities m equal to the relative transition frequencies from one state (r) to another (s) in each group gas follows: where P g is the total number of subjects of group g, S is the total number of visited clusters by group g, and E (r ⟶ s) p is the outgoing edge from cluster r to cluster s for patient p th . After calculating the entries m g r,s we obtain, as result, the transition probability matrix M for each category of subjects. Here, the assumption is that each subject of each category is considered as the "typical subject of that category" and corresponds to a possible realization of the typical exploration of that category. To clarify with an example, we considered 93 patients suffering of MCI; this means that the typical subject belonging to MCI has performed the tests 93 times. For practical reasons, we have transformed each transition matrix M according to the PageRank algorithm;  this means that the stochastic process we are assuming to model the exploration of concepts behaves 85% of time according to the probabilities of the above determined Markov chain and 15% of the time according to a discrete uniform distribution [45][46][47][48] (for more details on the choice of teleportation parameter in the PageRank algorithm, we refer the reader to the dedicated section in Supplementary Materials) as follows: where T represents the new transition matrix, α is equal to 0.85 according to the PageRank algorithm, and S is the number of the states of the Markov chain. Each Markov chain is then characterized by the steady state distribution π → and by the mean first passage time matrix MFPT. rough the former, we gain information about the process at the equilibrium, while, through the latter, we can have an insight into the dynamic of the process during the exploration of concepts. Bearing in mind the memoryless property of Markov chains and that the probability of being in state r after n steps is the r th entry of π n �→ � π 0 �→ T n , where π 0 �→ is the probability distribution of the initial state, the steady state corresponds to the long-run equilibrium, whatever the starting state is, as follows : π s → � lim n⟶+∞ T n r,s .
e steady-state distribution is found by solving the system of equations obtained by imposing with the constraint that all the components of π → must sum up to 1. e steady state distribution can be obtained also by means of eigenvectors. In this case, π → T � π → can be seen as erefore, π → can be obtained from to the left-eigenvector of the square matrix T corresponding to the eigenvalue λ � 1. e MFPT is obtained from the fundamental matrix Z: where I is the identity matrix, and W is a matrix of rows identical to π → . e MFPT is determined by We compare descriptors (i.e., π → and MFPT) according to four different metrics: Pearson correlation, Spearman's correlation, covariance, and Euclidean norm of the difference, and for MFPT, we compare also its Frobenius norm. It is to be noticed that, for the MFPT, we consider the common visited states by the groups of which we want to compute the metrics and take the matrix as a vector (resulting values available in Tables 5-11 of Supplementary Materials).
Data Availability e data that support the findings of this study are available from Azienda Provinciale per i Servizi Sanitari (APSS), Trento, Italy, but restrictions apply to the availability of these data, which were used under license for the current study and so are not publicly available. Data are, however, available from the authors upon reasonable request and with permission of APSS.

Conflicts of Interest
e authors declare no competing interests.

Authors' Contributions
Elena Bravi, Monica Dallabona, Manlio De Domenico, and Stefano Merler designed the study. Barbara Benigni prepared all the figures, performed the numerical experiments, and analyzed the data. Barbara Benigni and Manlio De Domenico wrote the main manuscript text. All authors reviewed the manuscript.