Inferring Depression and Its Semantic Underpinnings from Simple Lexical Choices

Spatial demonstratives are highly frequent linguistic universals, with at least two contrastive expressions (proximal ( “ this ” ) vs. distal ( “ that ” )) indicating physical, social, or functional proximity of the speaker to the referent object. Recent evidence based on the Demonstrative Choice Task (DCT), in which participants couple words with a spatial demonstrative with no context provided, suggests that demonstrative use is also indicative of experienced or emotional proximity to the self in an imagined mental space. As depression is characterized by increased and maladaptive focus on the self, the DCT may be a simple and reliable way to elicit behaviors that enable inference on the presence of severe depressive states and allow descriptions of the semantic characteristics of individual di ﬀ erences in such states. In two independent cross-sectional studies, including 775 and 879 participants, respectively, we showed that DCT-based classi ﬁ cation models reliably capture semantic characteristics of experiential states that are predictive of self-reported depression symptom severity, as measured by PHQ-9. In both samples, DCT classi ﬁ ers outperformed baseline models and replicated semantic patterns of negative a ﬀ ect previously observed to be associated with depression. This indicates that the paradigm captures semantic characteristics of the experiential states underlying depression symptoms and may be used to map individuals along a broad semantic space, potentially providing novel insights into individual di ﬀ erences in depressive states.


Introduction
Depression is characterized as a disorder of "self," involving maladaptive distortions in the experiential and narrative self [1].These alterations are mainly characterized by increased self-focused attention and highly negative self-representation [2].While such alterations may manifest as similar symptoms across subjects (e.g., sleep disturbances flattened affect), there appears to be a gap between the observed or reported symptom profiles and the underlying experiential profiles of individuals with depressive disorder.A large heterogeneity in symptom profiles can be observed [3,4], and even within individuals with similar symptom profiles, there is substantial variation in disorder trajectories [5], treatment efficiency [6], and comorbidity with other psycho-and somatic pathologies [7].While identifying the presence of standard symptoms is crucial for diagnosis, there may be important differences in the experiential states underlying these symptoms that are not captured in standard symptom scales.Means to investigate and identify characteristics of the experiential state of individuals may contribute importantly to symptom profiling approaches and provide information about the relationship between symptoms and the underlying mental experiences and potentially individual differences hereof.
The present work was aimed at investigating the extent to which differences in mental states related to depression symptom severity can be captured with a simple language task and characterized in semantic terms.It is well acknowledged that language use and language processing are highly reflective of individual differences in personality traits [8][9][10][11][12], gender [11], mood [13,14], stance detection [15][16][17], and demographic characteristics [18][19][20][21]).Differences in language use and processing have further proved to be effective markers of psychiatric symptoms, particularly in the case of depression [22][23][24][25], where even the most conservative classification models perform comparably to the standard of validated self-report scales of symptom severity [24] and clinical assessment [22].This suggests that language features capture a substantial portion of depression symptom variance.In these models, increased use of negatively valenced word categories and first-person singular pronouns is consistently among the strongest differentiating features for depression classification [1,24,[26][27][28][29][30], indicating increased and maladaptive self-focus.These findings reflect those observed in neuroimaging studies, where observed hyperactivation in emotional processing circuits for negative stimuli and hypoactivation for positive stimuli [31,32] are enhanced when involving self-referential emotional processing [31,33].
Recent evidence based on the Demonstrative Choice Task (DCT) indicates that the coupling of proximal ("this") and distal ("that") spatial demonstrative forms with nouns is indicative of individuals' experienced/emotional proximity to the target word.Spatial demonstratives are among the few language universals [34], and most languages have at least two forms, a proximal and distal form, delineating both a physical, functional, and social distinction between peripersonal and extrapersonal space [35].The usage of spatial demonstratives is thus indicative of the position of the speaker relative to the referent in both a physical and experiential (psychological) space in any given context [36,37]).The DCT [38] involves a binary forced choice between the proximal and distal demonstrative form for sequentially presented nouns.Each noun is presented in isolation, leaving no contextual anchors.In a large-scale DCT study, Rocca and Wallentin [38] showed that choices of proximal/distal demonstratives were highly structured across participants.Results indicated that demonstrative choices were structured according to the semantic characteristics of the items, where, for instance, words scoring high on features as fearful, harm, unpleasant, and angry were associated with more distal demonstrative responses, while nouns scoring high on needs, pleasant, happy, and self elicited higher proportions of proximal demonstrative forms.These findings hold across English and Spanish [39] and Danish and Italian [37] and suggest that choice of demonstrative forms not only reflects contextually bound clues about proximity in physical space but also carries information about the position of individuals in a nonphysical semantic space.
Capturing important dimensions concerning self-focused mental representations, the DCT may encode information relevant to inferring the presence of severe depression and may be a simple and effective tool to identify and study the structure of semantic representations underlying depression and other disorders of the self.The present study investigated whether maladaptive mental states related to depression can be reliably captured and described using the DCT.We hypothesized that (a) predictive models can reliably identify individuals with high depression symptom scores based on representations of their behavior in the DCT and (b) words that are most predictive in this task will map onto semantic dimensions that are traditionally associated with depressionrelated alterations of the experiential self (e.g., negative valence).The significance of these results would be twofold.First, the DCT may provide additional assessment tools for depression whose added value lies in not directly priming towards reflection on depressive symptoms, potentially reducing biases in self-report.Second, the DCT may make it possible to characterize individual differences in disorder states within clinical groups, providing insights into individuals' specific experiential profiles in ways that are not captured by standard symptom scales.
The study was conducted in two independent samples to assess robustness and replicability of the results.The replication procedure was preregistered prior to conducting study 2 (https://osf.io/bqhyg/).

Materials and Methods
The submitted study adheres to the procedure described in the preregistered protocol, adding only a few elements for further data scrutiny (see details in Supplementary Materials).The project was approved by the Institutional Review Board at Aarhus University.
2.1.Participants.The experiments were conducted on the online platform Prolific (https://www.prolific.co).All participants were native English speakers (age ≥ 18 years).No other inclusion criteria were defined.Subjects were excluded if they fulfilled at least one of three criteria indicating low effort: (1) reaction time (RT) below 300 ms. in more than 10% of the trials, (2) response (button) entropy below 0.80 indicating a consistent response pattern irrespective of the stimuli (see entropy equation in Supplementary Materials), and (3) more than 3 of 15 failed attention checks.
Study 1 included 1004 participants, of which 201 subjects were excluded due to missing data in either task or questionnaire responses.Additionally, 28 subjects were excluded based on the three low effort criteria, yielding a final sample of 775 participants (gender: 352 female, 412 male, 10 nonbinary, and 1 other; age: 159 were 18-29 years, 211 were 30-39 years, 147 were 40-49 years, 149 were 50-59 years, 107 were 60+ years, and 2 did not report age).

Demonstrative Choice Task (DCT).
Participants completed a 300-item Demonstrative Choice Task (DCT) adapted to the purpose from Rocca and Wallentin [38] (see Supplementary Experimental Procedures; full stimulus list in Figure S2).For each trial, an English noun was presented on the screen and participants were to match it with either a proximal ("this") or distal ("that") demonstrative forms by clicking one of two buttons presented below the stimulus (Figure 1).Trial order was randomized for each subject.

Depression and Anxiety
Participants were unaware of the purpose of the study and informed that there was no incorrect answers and instructed to respond based on their immediate preference (Figure S1).
For details on the experimental procedure, see Supplementary Materials.
2.2.2.9-Item Patient Health Questionnaire (PHQ-9).Depression symptom severity was measured with the 9-item Patient Health Questionnaire (PHQ-9) [40].PHQ-9 is a selfadministered version of the PRIME-MD diagnostic instrument and measures each of the 9 DSM-IV criteria for depression on a 4-point Likert scale ranging from 0 ("not at all") to 3 ("nearly every day").The PHQ-9 instrument has demonstrated robust validity and reliability [41,42], as well as sensitivity and specificity [43].A sum score ≥ 10 (corresponding to moderate depression) was defined as threshold for classification of participants into the control or depression group.shown to be associated with language usage [11,18] as well as correlated with depression prevalence [44], and the second model addressed whether accounting for these variables could improve model performance.We additionally investigated whether performance of the DCT classifier improved, if restricting the training sample to subjects exhibiting testretest reliability above 70% (see Supplementary Materials, Table S1 and Table S2).
Performance of the DCT classifiers was compared to two baseline models: one including only gender and age as predictors of the outcome group (mGenderAge) and one random baseline trained to classify a randomly shuffled version of the outcome group from DCT responses (mRandomBaseline).All models were trained on 70% of the data and evaluated on 30% of the data, stratified by the outcome group.Model performance was evaluated on out-of-sample classification accuracy, balanced between sensitivity (true positive rate) and specificity (true negative rate) and ROC AUC scores.Accuracy rate along with 95% confidence intervals for this rate was computed with a binomial test.p values for classification performance were computed with a one-sided test, evaluating whether performance was better than the no information rate, taken to be the largest class percentage in the data.
Data sensitivity analysis was performed for each model with nonparametric bootstrapping, evaluating robustness of model performance and feature importance to random data variation (see Supplementary Materials for details).
Neither the random baseline model nor the demographic baseline model performed better than chance on classification of the depression group (Table 1 and Figure S5).ROC-AUC curves and confusion matrices are reported in Supplementary Materials (Figure S5 and Figure S6).
Bootstrapped data sensitivity analysis indicated that these patterns are robust to random variation in the data (Figure 2).

Semantic Effects.
The fifty strongest positive and negative predictive DCT items for each model are visualized in Figure 3.A negative regression effect indicates that participants in the depression group were more likely than individuals in the control group to respond with a proximal demonstrative for the given item, while a positive regression effect indicates that they were more likely to respond with a distal demonstrative compared to the control group.Post hoc semantic analysis of the word effects in the best model (mDCT+Demo) showed a positive relationship between DCT item scores on semantic features of trust, valence, dominance, and joy and classification weights in the model.Contrary, results showed a negative relationship between DCT item classification weights and scores on the features disgust, anger, sadness, arousal, and fear (Figure S7, Table S3, Figure S8, and Table S4 in Supplementary Materials).These results indicate that participants in the depression group tended to respond with a distal demonstrative more often for highly negatively valenced words, while the opposite is true for highly positively valenced words.
Neither the random baseline model nor the demographic baseline models performed better than chance on classification of the depression group (Table 2 and Figure S9).ROC-AUC curves and confusion matrices of all models are reported in Supplementary Materials (Figure S9 and Figure S10).
Bootstrapped data sensitivity analysis showed that the patterns observed are robust to random data-induced variance (Figure 4).

Semantic Effects.
The fifty strongest positive and negative predictive DCT items for each model are visualized in Figure 5. Post hoc semantic analysis of the word effects in the best model (mDCT+Demo) showed a positive relationship between DCT item scores on the semantic features valence and dominance and item classification weights.
Contrary, there was a negative relationship between item scores on the features sadness, surprise, arousal, and fear and item classification weights (Figure S7 and Figure S8 in Supplementary Materials).

Correlation of Feature Importance in Study 1 and Study 2.
The Pearson correlation of the bootstrapped word effects between study 1 and 2 was 0.35 (p < 001) for the mDCT model and 0.27 (p < 001) for the mDCT+Demo model (Figure S11 and Figure S12 in Supplementary Materials).In comparison, word effects of the mRandomBaseline model exhibited a correlation between study 1 and 2 of -0.08 (p = 1) (Figure S13).

Semantic Subject Profiles
Post hoc analyses were conducted to assess whether subjectwise semantic representations of DCT behavior were predictive of depression symptom severity.Each subject was ascribed a score on each of the 11 semantic features in the NRC-VAD lexicon [45,46], calculated as the product of responses (-1 or 1) for each item and the item score on each semantic feature.Each participant was thus represented by a semantic vector of size 11, where low feature scores indicate larger proportion of proximal demonstrative choices for words scoring high on these dimensions, while high feature scores indicate larger proportions of distal demonstrative    5 Depression and Anxiety choices.A linear Bayesian model (BRM) was fitted for each semantic feature as predictor of the continuous PHQ-9 sum score, to evaluate the relationship between the semantic profile and depression symptom severity.Each BRM was estimated with 4 chains and 2000 iterations.
For both study 1 and 2, results indicated a negative effect of sadness, fear, disgust, and anger on PHQ-9 sum score (Figure 6), indicating that more proximal demonstrative choices on words scoring high on these dimensions predicted higher PHQ-9 scores.Further, results showed a positive effect of joy, trust, and valence in study 1, indicating that more proximal demonstrative choices for words scoring high on these dimensions predicted lower PHQ-9 sum score.Posterior distributions for these positive effects in study 2 were in the same direction but overlapped with zero (Table S5 and Table S6).

Discussion
The present studies found that a simple lexical task, the DCT, elicits behaviors that can be used to infer self-reported depres-sion status with classification accuracy ranging between 62% and 66% across two independent samples.Additionally, the DCT replicated semantic patterns of negative affect previously observed to be associated with depression [14,24,29,47,48].Demonstrative choices for items scoring high on negative valence were consistently the strongest predictors in both studies, where proximal choices were predictive of the depression group and distal choices were predictive of the control group.
These results indicate that the DCT may be a useful tool to assist assessment of depression symptom severity, as it captures differences reliably related to depression in an indirect manner, i.e., without directly asking about depression.Such an approach may reduce potential biasing effects of meta-reflections in overt self-report symptom rating scales.While accuracy performance of the models were not as impressive as those observed for large social media based classification models, they are reliable across samples and recover semantic effects associated with depression.This indicates that it may be possible to adapt the model in ways that would optimize predictive performance, for instance, by  6 Depression and Anxiety assigning higher weights to particular word categories expected to have more predictive power, or replacing items found to be noninformative.The semantic effects observed in the present study demonstrate the ability of the DCT to capture differences in semantic representations that are associated with self-reported depression symptoms.While we investigated semantic effects of the 11 emotional NRC-VAD features, expanding the paradigm to include a broader range of semantic features (i.e., not restricted to valence) could provide novel insights into the experiential states of depression.Similarly, the potential to create semantic subject profiles may allow investigations into individual differences in depressive states.The present study identified semantic characteristics shared across individuals with high depression symptom severity; however, it is likely that some semantic dimensions capture general depressive states shared across all patients (e.g.valence), while other dimensions may be descriptive of states that differ significantly between patients.Semantic categories as social, body, money, and responsibility, for instance, have exhibited strong relationships with individual differences in personality traits [11,12] and may capture important aspects of individual differences in depressive experience.By mapping individuals along a broad set of semantic dimensions based on DCT behavior and computing individual deviations along each feature with respect to the group norm, the paradigm may improve our understanding of both homogeneous and heterogeneous characteristics of the experiential profiles underlying clinical symp-toms.Further, such an approach may be extended to individual differences in other maladaptive states such as those associated with psychosis or personality disorders.
While the results show that structures in DCT behavior reliably relate to self-reported depression symptom severity, there are substantial amounts of unmodeled variation in behavior across individuals.Some variation is to be expected, as the task is binary and involves responses based on intuition rather than explicit reflective decisions.Additionally, the paradigm is sensitive to task context and transient mental states (e.g., participants may be more likely to respond with a proximal demonstrative for the item "Friday" if the task is conducted on a Friday).Thus, a series of trials is necessary for stable patterns to emerge.Importantly, some of the transient effects in DCT behavior may reflect important within-subject dynamics that are psychologically relevant (e.g., frequent mood changes or recurring anxiety states).Such sources of variation could potentially be dissociated from random noise in longitudinal DCT studies and would likely improve inference at the individual level.Additionally, obtaining more comprehensive semantic feature space along which DCT items can be scored would allow model inference based on semantic dimensions rather than individual items, reducing the impact of item-specific random variation on model performance.
The models presented in this work are based on selfreported depression symptoms, as indicated by the PHQ-9.While the PHQ-9 is a commonly used instrument to assess depression symptom severity, it is not a diagnostic tool  7 Depression and Anxiety and does not allow conclusions on the presence of clinical depression.Future work should aim to validate these models in a sample of clinically diagnosed patients.It is a general challenge that classification models can never be better than the objective against which they are evaluated.Subjective verbal reports and assessments of symptom severity are fundamental to clinical depression assessments and diagnoses, which are the gold standard of evaluation of any other model.Thus, DCT-based classification models cannot perform better than standard scales in identifying depression.What the present results suggest is that the paradigm may be a useful complementary tool to standard diagnostic procedures, as it captures information related to the presence of depression symptoms, and allow semantic analyses that could provide a more nuanced picture of individual disorder states.

Conclusions
The present results demonstrated that a simple lexical choice task reliably captures semantic characteristics of experiential states that are predictive of depression symptom severity across two independent samples.Future work may allow the mapping of individual differences in disorder states along a diverse set of semantic features and provide new insights into the specific experiential profiles underlying clinical symptoms and potential individual differences hereof.

3 Depression and Anxiety 4 . 2 .
Classification Performance.The two DCT-based models performed significantly better than chance on classification of the depression group (Table

Figure 1 :
Figure 1: Demonstration of two trials in the DCT.In each trial, one noun is presented together with two response options.Response options change positions (4 different configurations) at random.Each stimulus is presented until the participant presses a button.Trials are separated by a fixation cross presented for 1000 ms.

Figure 3 :
Figure 3: Study 1: the 50 DCT items with strongest positive (green) and negative (blue) regression coefficients for (a) mDCT and (b) mDCT +Demo.Positive effects indicate that choosing the proximal demonstrative for the given item increases the likelihood of being classified as control case.Negative regression effects indicate that choosing the proximal demonstrative for the given item increases the likelihood of being classified as depression case.

Figure 5 :
Figure 5: Study 2: the 50 DCT items with strongest positive (green) and negative (blue) regression coefficients for (a) mDCT and (b) mDCT +Demo.Positive effects indicate that choosing the proximal demonstrative for the given item increases the likelihood of being classified as control case.Negative regression effects indicate that choosing the proximal demonstrative for the given item increases the likelihood of being classified as depression case.