How Differing Audiences Were Associated with User Emotional Expression on a Well-Being App

In the last ﬁ ve years, there has been an explosion of mobile apps that aim to impact emotional well-being, yet limited research has examined the ways that users interact, and speci ﬁ cally write to develop a therapeutic alliance within these apps. Writing is a developmental practice in which a narrator transforms amorphous thoughts and emotions into expressions, and according to narrative theory, the linguistic characteristics of writing can be understood as a physical manifestation of a narrator ’ s a ﬀ ect. Informed by literacy theorists who have argued convincingly that narrators address di ﬀ erent audiences in di ﬀ erent ways, we used IBM Watson ’ s Natural Language Processing software (IBM Watson NLP) to examine how users expression of emotion on a well-being app di ﬀ ered depending on the audience. Our ﬁ ndings demonstrate that audience was strongly associated with the way users expressed emotions in writing. When writing to an explicit audience users wrote longer narratives, with less sadness, less anger, less disgust, less fear, and more joy, these ﬁ ndings have direct relevance for researchers and well-being app design.


Introduction
In the last five years, there has been an explosion of apps designed to support emotional well-being, yet limited research has examined the ways that users interact, and specifically write to develop a therapeutic alliance, within these apps [1,2]. Informed by literacy and psychological research that has illuminated the ways that writers address different audiences in different ways [3][4][5][6][7][8][9], the current analysis examined how users expressed emotion in a well-being app when they were writing to an explicit as compared to an implied audience.
In 2014 alone, there were over 100,000 apps available to provide lay-people with health information [10]. By 2017, this number increased to approximately 325,000 [11]. The mobile app of study was designed to enhance mental wellbeing and like many mental health apps relied on a chatbot to deliver therapeutic content. One way the chatbot interacted was to send messages that encouraged users to share their thoughts and feelings in writing. Chatbot's have been defined as digital tools "that use machine learning and artificial intelligence methods to mimic humanlike behaviours and provide a task-oriented framework with evolving dialogue able to participate in conversation" [12]. Selfdisclosure to a chatbot may have many of the same positive psychological outcomes as self-disclosure to a person [13], and people appear to interact with bots in much the same way that they interact with other humans [9].
Most studies of commercially available mental health apps focus on whether the app increases well-being or decreases mental health symptoms [14,15]. However, researchers have yet to identify the emotional and cognitive processes by which mental health apps may influence wellbeing. According to narrative and psychological theory, the linguistic characteristics of a narrator's writing are a physical manifestation of their affect [16][17][18][19]. Thus, writing is a developmental practice in which a narrator transforms amorphous thoughts and emotions into expressions. In fact, a recent study showed that the expression of high rates of negative affect in writing was associated with higher neuroticism and depression and lower overall health, and conversely, positive affect was associated with higher extraversion, agreeableness, overall health, and lower selfreported neuroticism and depression [20].
However, narrative expression is not fixed, but relational and narrators express thoughts and emotions differently when addressing distinct audiences [16,21]. Researchers have found that when a specific and active audience is addressed, it potentially influences writers' compositions more than when writers are directed to write for a general audience that offers no feedback (Black, 1989;Cohen & Riel, 1989;Freedman, 1994;Purcell-Gates, Duke & Martineau, 2007). Researchers have found that when a specific and active audience is addressed, it potentially influences writers' compositions more than when writers are directed to write for a general audience that offers no feedback [5,[22][23][24]. In one such study, first year college students used more cognitive (e.g., think, know, and realize) and intensifying expressions when randomly assigned to a blogging activity where peers could interact and comment on the blog posts as compared to a control group of students who wrote to an imagined peer [7]. The current study is aimed at identifying how users' expression of emotion on a well-being app differed depending on the audience.

Method
2.1. Participants and Procedure. The app of study, LiveBetter, is a mobile application designed to promote emotional well-being. It is free and publicly available for Apple and Android devices. An update was pushed to all app users with the option to opt in or out of research. Data was pulled for users who did not opt out. This initial data was deidentified (email addresses and birthdates were removed) and pulled from the server on July 16, 2019. The data was then cleaned to only include users who had completed at least one in-app activity and who had written at least three words in response to this activity. This dataset was provided to the authors for analysis. Columbia University's Institutional Review Board determined this was Not Human Subjects Research. The final sample included 1269 app users from five continents who wrote at least three words on the mobile application between 2017 and 2019. All written content and chatbot interactions were in English.

Presence and Strength of Emotion.
Over the course of studying human emotion, philosophers and psychologists have proposed between three and eleven primary emotions [25]. All these proposals include fear, anger, and sadness and most include joy as well. IBM Watson is the software selected to code for the presence and strength of emotions in this study identifies sadness, fear, anger, joy, and disgust. Disgust was a useful addition as it can be understood as an intersection of contempt and remorse, on the pathway between loathing and boredom. Therefore, fear, anger, sadness, joy, and disgust were the five emotions examined in this study.

IBM Watson and Natural Language Understanding.
IBM Watson has been used to analyze sentiment and emotion in a wide range of writing contexts and has been found to have high reliability, validity, and efficiency [26,27]. The dataset was narrowed to include only activities with user generated text content. These activities were eventually ordered within each user by timestamp to include each participant's first written interaction with the mobile app. This text was then analyzed using IBM Watson Natural Language Understanding (NLU). Natural Language Processing (NLP) refers to the automatic process of extracting information from spoken or written text. Natural Language Understanding was used in this context to determine the strength of emotion (sadness, fear, anger, joy, and disgust) present in the participant response to the activity.
One example of this process would be the example user input, "I think that there are flaws with my body and how others perceive me though I know that's my own thought projection." This input resulted in the following output for emotion strength: sadness: 0.418, joy: 0.236, fear: 0.306, disgust: 0.09, and anger: 0.088. Each emotion was coded for strength on a scale of 0-1 with 1.0 being the strongest possible emotion. To complete analysis, Watson NLU requires at least three words per response which it then automatically compares to a preexisting library of language that has been coded for presence and strength of emotion.
The dataset was then imported into an SQL database which was used to organize user message length by character count. We then coded the activities within the app based on intended audience and identified two types of audience, writing to an implied audience and writing to an explicit audience. One example of an activity prompt that would be coded as "implied audience" was "What feeling did you get from your happy memory?" An example of an activity prompt that was coded as "explicit audience" was "What would you tell a friend if they had your worry?" Potential "explicit audiences" included in writing prompts could be friends, partners, family members, or work colleagues.

Statistical Methods.
Prevalence of sociodemographic characteristics, device type (Apple or Android), length of writing, and the strength of emotion were evaluated using basic descriptive statistics. For the categorical variables, we showed the frequency table that has counts and percentages. For the continuous variables, we presented the following: number of observations, mean, standard deviation, max, min, the first quartile (Q1), and the third quartile (Q3).
We used chi-square tests to explore if the sociodemographic characteristics were dependent on the type of audience, either writing to an implied or explicit audience. We used a series of Wilcoxon-Mann-Whitney tests to examine if writing length and strength of emotion differed by explicit or implied audience.
Six separate multivariate linear regression models were built to find how writing length and emotion scores are related to audience adjusted by gender and year. Since neither length of writing nor emotion scores are normally distributed, we took the natural logarithm for these outcome variables.
Shapiro-Wilk test was used to test normality and found sufficient evidence that writing length and strength of emotions for each audience type were not normally distributed. Median values and the 25th and 75th percentiles of each dependent variable were presented by audience. A series of Mann-Whitney U tests were performed to examine if writing length and strength of emotions differed by audience. Hodges-Lehmann estimate and 95% confidence interval of the difference between the median values of the two audience types were presented. Effect sizes were calculated by dividing the absolute standardized test statistic z by the square root of the number of pairs: r = jzjðnÞ −1/2 .

Results
Most participants used Apple devices (75.37%) and were female (61.15%). Most were located in North America (68.67%), followed by Europe (16.02%), Asia (7.1%), Australia (5.92%), and Africa (1.74%) as noted in Table 1. Geography was determined by time zone which was captured from app user metadata. In terms of audience, most of the writing coded as "writing to implied audience" (92.59%).
Out of the 1269 app users, the average length of writing was 18.04 words, the average sadness score was 0.25, the average anger score was 0.11, the average fear score was 0.15, the average disgust score was 0.08, and the average joy score was 0.42. There was no significant dependency between sociodemographic characteristics and audience type, apart from time ( Table 2). Writing length and emotions were significantly associated with audience type (Table 3). Fitting linear regression models, we found when writing to an explicit audience as compared to writing to an implied audience, users wrote longer narratives (CI 1.32-1.99, p < 0:0001) and wrote with less sadness (CI 0.33-0.58, p < 0:0001), less fear (CI 0.23, 0.43, p < 0:0001), less anger (CI 0.34-0.62, p < 0:0001), more joy (CI 1.83-3.53, p < 0:0001), and less disgust (CI 0.24-0.45, p < 0:0001) ( Table 4).
The two-sided p values of the Mann-Whitney U tests were all less than 0.0001 which indicated the writing length and emotions were significantly differed by audience type ( Table 3). The effect sizes were all less than 0.3 which would be considered as small effects according to Cohen's classification of effect sizes. Hodges-Lehmann estimates indicated that the median difference (95% CI) of length between writing to other and writing to self was 6.00 (3.00, 9.00), the median difference (95% CI) of sadness was -0.08 (-0.12, All analyses include only users first completed activity from users who wrote at least 3 words, wrote in English.   Human Behavior and Emerging Technologies -0.05), the median difference (95% CI) of anger was -0.03 (-0.04, -0.01), the median difference (95% CI) of fear was -0.04 (-0.06, -0.03), the median difference (95% CI) of disgust was -0.02 (-0.03, -0.01), and the median difference (95% CI) of joy was 0.31 (0.23, 0.40). There was no significant dependency between sociodemographic characteristics and audience type, apart from time ( Table 2).

Discussion
Our findings demonstrate that audience was strongly associated with the way users expressed emotions when writing in a well-being app. When writing to an explicit audience users wrote longer narratives, with less sadness, less anger, less disgust, less fear, and more joy, this has direct relevance for researchers and well-being app designers as eliciting greater positive emotion may contribute to improved well-being and physical health.
The current work is the first to our knowledge to demonstrate how the audience impacts a narrator's expression of affect in the context of a well-being app. However, more generally, the importance of audience has been long documented in narrative and psychological research [5,7,8,21]. Further, narrative and psychological theory posit that affective expression is in fact a physical representation of emotional experience [16][17][18][19]. Recent empirical evidence supports this theoretical position and demonstrates that affective expression in writing corresponds with emotional experiences such that increased use of positive emotions was correlated with higher well-being and better physical health [20]. Therefore, creators of chatbot-based well-being interventions may consider prompting users to write to an explicit audience to elicit more content and stronger positive emotion.
Future research must continue to examine how writing in well-being apps impacts psychological well-being overtime. For instance, might writing to explicit others and thus using more positive emotions (e.g., joy) increase well-being overtime, and could writing about sadness, fear and disgust be linked with decreased well-being? Conversely, overtime might addressing and expressing negative emotions through writing help people work through psychosocial challenges and thus experience increased well-being in the long run.

Limitations.
One limitation of the current study was that many more people wrote to an implied audience as opposed to writing to an explicit audience. Therefore, it is possible that there may be a lack of generalizability. Another limitation was that we only analyzed the first writing activity, and it is possible that emotions would differ overtime by audience. Future research should examine the differences in writing based on audience and how writing on this platform may change overtime.
Another limitation of the current study was that most of the users were in North American or European time zones and using iPhone, and thus, the generalizability of our findings may be limited. Three major benefits of mobile appbased interventions are their scalability, affordability, and potential to overcome stigma [28,29]. Further studies with broader global populations and more detailed demographics including age, race, and socioeconomic measures could properly assess the generalizability of these findings and the appropriateness of this intervention across cultures and genders.

Conclusion
As well-being apps become increasingly popular [11], it is necessary to ensure that app developers consider how specific writing prompts within the app elicit distinct psychological processes. The stark differences depending on audience identified in our analysis suggest that audience impacts the ways that narrator's express emotion. Therefore, researchers and app designers must carefully consider the impact of audience when designing well-being apps. Accordingly, developers who aim to foster writing with more positive affect should consider using explicit as opposed to implicit audiences in their writing prompts. App-based well-being interventions should give special consideration to the wording of questions and specifically to the use of explicit or implied audience of written prompts.

Data Availability
We can provide deidentified data upon request.

Disclosure
Any opinions, findings, and conclusions, or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Institute of Health.

Conflicts of Interest
The authors declare that they have no conflicts of interest.