With the pervasive increase in social media use, the explosion of users’ generated data provides a potentially very rich source of information, which plays an important role in helping online researchers understand user’s behaviors deeply. Since user’s personality traits are the driving force of user’s behaviors, hence, in this paper, along with social network features, we first extract linguistic features, emotional statistical features, and topic features from user’s Facebook status updates, followed by quantifying importance of features via Kendall correlation coefficient. And then, on the basis of weighted features and dynamic updated thresholds of personality traits, we deploy a novel adaptive conditional probability-based predicting model which considers prior knowledge of correlations between user’s personality traits to predict user’s Big Five personality traits. In the experimental work, we explore the existence of correlations between user’s personality traits which provides a better theoretical support for our proposed method. Moreover, on the same Facebook dataset, compared to other methods, our method can achieve an
As a new medium for information dissemination, social network has become a novel means of social interactions. Hence, user’s individual behaviors have gradually turned into the key factors in social networks analysis. Besides, although some users post their desirable images and lives onto social media to achieve self-presentation which reflect some sort of “untrue” self, users’ contributions and activities, which can be instantly made available to entire social network [
Psychologists believed that user’s personality traits were the driving force of user’s behaviors, and individual differences in personality traits may have an impact on user’s online activities [
Nevertheless, confronted with the same problem as stated in [ Demonstrate the existence of interdependencies between user’s personality traits. On account of social network features, linguistic features, emotional statistical features, and topic features, we put forward a novel unsupervised adaptive conditional probability-based framework for the problem of predicting user’s personality traits through taking prior knowledge of correlations between user’s personality traits into consideration. Exploit correlations between features and user’s personality traits via Kendall correlation coefficient, so as to quantify importance of each feature. Update threshold of each personality trait dynamically rather than adopt a unified threshold.
The rest of the paper is organized as follows: Section
As research on user’s behaviors in social networks has become a hot spot, user’s personality recognition has received a significant amount of attention in both theory and practice. Argamon et al. [
The other one extended personality-related features with linguistic cues. On the basis of the corpus which was derived from essays written by students at the University of Texas at Austin [
Mairesse et al. [
However, some drawbacks can be pointed out in previous work on user’s personality traits prediction: (1) some researchers have made an assumption that there had been little or no correlations between user’s personality traits [
In this section, we present predicting model adopted in our work. Initially we make a definition of preliminary features (Section
As a person’s unique pattern of long-term thoughts, emotions, and behaviors, personality traits are reflected in user’s attitudes towards things and actions taken by user. Therefore, aside from social network features which were provided in Facebook dataset [
Facebook dataset includes seven social network features, namely, date of user’s register, network size, ego betweenness centrality, normalized ego betweenness centrality, density, brokerage, normalized brokerage, and transitivity, which reflect user’s behavior patterns just through user’s network structure. Similar to the cluster assumption [
Since each user showed a particular mode of expression, some researchers held the view that correlations between personality traits and spoken or written linguistic cues were significant [
A natural language parser is used to work out grammatical structure of sentences, such as grouping words together as “phrases” and obtaining subject or object of a verb. Probabilistic parsers try to produce the most likely analysis of new sentences via leveraging knowledge of language gained from hand-parsed sentences. Stanford Parser (
User’s attitudes towards things, which show user’s different personality traits, reflect user’s unique pattern of long-term emotions. For instance, a neurotic person may have a tendency to experience unpleasant emotions easily, such as anger, anxiety, depression, and vulnerability. Hence, user’s statistics of emotion can be characteristics in user’s personality traits predicting model. In this paper, user’s emotional statistical characteristics included proportion of positive words and negative words used in user’s posts. On the basis of adjectives and their variants obtained in Section
As user’s emotional statistical features, user’s positive and negative emotional statistical characteristics are defined as
The things user focuses on may have an impact on actions that user has taken. Take openness which is one of the Big Five personality traits as an example: it reflects degree of intellectual curiosity, creativity, and a preference for novelty and variety a person has. Therefore, we mined a series of user’s concerned themes from user’s status via LDA (Latent Dirichlet Allocation) [
Since our purpose is to extract all concerned themes of a user, rather than to extract specific themes of each post, we merged all microblogs of a user into one document and then extracted user’s concerned themes; namely, each document corresponded to a user. The results of LDA model are shown as follows:
Every feature has a different impact on user’s personality traits prediction. It is of great importance to allocate features’ weights reasonably so as to be able to perform good prediction based only on scant knowledge of personality traits. Kendall test is a nonparametric hypothesis test which calculates correlation coefficient to test statistical dependence of two random variables. Since values of features and scores of personality traits in the dataset we used were numeric, therefore, in order to quantify importance of each feature, we analyzed relevance between user’s personality traits and characteristics via Kendall correlation coefficient in which values of features and scores of personality traits were treated as two random variables. Kendall correlation coefficient is calculated as
Figure
The architecture of adaptive conditional probability-based predicting model for user’s personality traits.
Before we proposed our predicting algorithm, we conducted experiments to analyze correlations between user’s personality traits which will be shown in detail later in Section
In psychology, the Big Five personality traits are five broad domains or dimensions of personality traits that are used to describe human personality. The theory based on the Big Five factors is called Five Factor Model (FFM) [
Our method aimed to predict user
Secondly, we sorted user
Thirdly, we selected a
And then, we calculated error rate of each personality trait, and error rate of
Finally, if unvisited personality traits set UP was empty and error rate of each personality trait fell within a specified range, then algorithm was terminated; else if unvisited personality traits set UP was not empty, then we continued to predict another personality trait in UP; else if there was a personality trait
Input: user Output: user (1) / otherwise (2) For each personality trait (3) set temporary error rate (4) End for (5) For each personality trait (6) set initial threshold (7) End for (8) (9) (10) For each feature (11) calculate (12) End for (13) For user (14) calculate (15) End for (16) sort (17) (18) For each personality trait (19) (20) End for (21) While (22) (23) If (24) calculate (25) End if (26) Else if (27) (28) (29) (30) (31) End if (32) Else if (33) (34) (35) End while (36) (37) For each personality trait (38) calculate (39) If (40) (41) (42) update (43) (44) End if (45) End for (46) If (47) Go to step (21) (48) End if (49) Else if (50) Go to step (52) (51) End if (52) Return
In this section, we first describe dataset used in our experiments. And then we analyze the interdependent relationships between user’s personality traits. Finally, we conduct experiments on different kinds of features and make a comparison with other methods based on the same dataset.
MyPersonality (
Previous works were usually predicting user’s personality traits without considering interdependencies between them. In this context, we investigated whether there had been relationships between user’s personality traits. Since not all users had the same level of interactions, consequently, we first grouped users according to different degrees of user’s personality traits. As an example, if a user has positive level of agreeableness, another user has positive level of agreeableness as well; then they will be divided into a group; else if a user has positive level of agreeableness, another user has negative level of agreeableness, and they will not be divided into a group. Then, we explored cooccurrences between personality traits in each group simultaneously, which is shown in Figure
The proportion of users who have a certain pair of personality traits simultaneously. PEXT, PNEU, PAGR, PCON, and POPN stand for positive level of extraversion, neuroticism, agreeableness, conscientiousness, and openness. NEXT, NNEU, NAGR, NCON, and NOPN stand for negative level of extraversion, neuroticism, agreeableness, conscientiousness, and openness.
It can be observed intuitively that a significant proportion of users have negative level of extraversion, positive level of openness, or positive level of agreeableness. It is also noteworthy that positive level of extraversion has less overlap with negative level of openness, negative level of conscientiousness, and negative level of agreeableness. Besides, positive level of neuroticism has less overlap with positive level of conscientiousness, positive level of agreeableness, and negative level of openness. Furthermore, there are bits of users that have positive level of extraversion and positive level of neuroticism simultaneously.
However, the above cooccurrences may be due to the a priori statistics of each trait. For instance, if there are more users with negative level of neuroticism, positive level of openness, and positive level of agreeableness than others, it is more likely to have more users with cooccurrence between negative level of neuroticism and positive level of openness, negative level of neuroticism and positive level of agreeableness, and positive level of openness and positive level of agreeableness than others, as it is shown in Figure
Kendall correlation coefficients between personality traits.
EXT | NEU | AGR | CON | OPN | |||||||
---|---|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
| ||
EXT |
|
0.22 | −0.03 | 0.09 | −0.15 | −0.06 | −0.13 | 0.07 | 0.01 | ||
|
|
−0.18 | 0.04 | 0.04 | 0.04 | 0.18 | 0.07 | 0.22 | |||
|
|||||||||||
NEU |
|
−0.09 | −0.17 | 0.10 | −0.21 | −0.03 | 0.27 | ||||
|
−0.14 | −0.13 | 0.02 | −0.01 | −0.06 | −0.10 | |||||
|
|||||||||||
AGR |
|
−0.07 | 0.07 | 0.16 |
|
||||||
|
−0.08 | 0.16 | 0.06 | 0.02 | |||||||
|
|||||||||||
CON |
|
−0.01 | −0.10 | ||||||||
|
0.07 | −0.07 | |||||||||
|
|||||||||||
OPN |
|
||||||||||
|
As it can be seen from Table
What is more, it is inconsistent with Figure
Since there are some contradictions between Figure
Jensen-Shannon divergence between personality traits.
EXT | NEU | AGR | CON | OPN | |||||||
---|---|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
| ||
EXT |
|
0.58 | 14.79 | 0.44 | 2.19 | 0.70 | 2.36 | 0.73 | 0.84 | ||
|
4.79 | 4.49 | 5.21 | 2.36 | 5.61 | 1.91 | 10.43 | 2.11 | |||
|
|||||||||||
NEU |
|
0.80 | 2.12 | 1.10 | 2.46 | 2.98 |
|
||||
|
19.78 | 2.94 | 16.97 | 3.05 |
|
3.61 | |||||
|
|||||||||||
AGR |
|
0.77 | 3.29 | 0.82 | 0.91 | ||||||
|
3.40 | 0.86 | 4.63 | 1.25 | |||||||
|
|||||||||||
CON |
|
1.15 | 1.17 | ||||||||
|
6.16 | 1.40 | |||||||||
|
|||||||||||
OPN |
|
||||||||||
|
In keeping with Figure
In addition, Soto et al. [
Since most researchers exploring on personality recognition leveraged different measures to evaluate their experiments on various datasets, it was hard to completely appraise their performance and quality. However, a common dataset which was annotated with gold standard personality labels was available in the Workshop on Computational Personality Recognition (Shared Task), users were free to split the training and test sets as they wish, and precision, recall, and
First, we carried on experiments with our proposed method. Scores of precision, recall, and
Precision, recall, and
Precision | Recall |
| |
---|---|---|---|
EXT | 0.93 | 0.76 | 0.83 |
NEU | 0.86 | 0.67 | 0.75 |
AGR |
|
0.74 | 0.83 |
CON | 0.89 | 0.70 | 0.78 |
OPN | 0.91 |
|
|
Mean | 0.91 | 0.732 | 0.806 |
In this section, we conducted experiments on social network features, linguistic features, emotional statistical features, topic features, and all features, respectively. Due to space restrictions, Figure
It can be observed that experiments which are conducted on linguistic features, emotional statistical features, and topic features result in unsatisfactory performance with respect to social network features, and social network features have the best classification performance for extraversion which is consistent with the conclusion in [
Additionally, we conducted experiments on unweighted features. Here, we only presented the most successful results in Table
Precision, recall, and
Precision | Recall |
| |
---|---|---|---|
EXT | 0.85 | 0.76 | 0.82 |
NEU | 0.92 | 0.68 | 0.76 |
AGR |
|
0.64 | 0.76 |
CON | 0.91 | 0.57 | 0.68 |
OPN | 0.87 |
|
|
Mean | 0.90 | 0.686 | 0.77 |
From Tables
In addition, we predicted user’s personality traits with unified thresholds as well. The results are summarized in Table
Precision, recall, and
Precision | Recall |
| |
---|---|---|---|
EXT | 0.62 | 0.54 | 0.57 |
NEU |
|
0.16 | 0.25 |
AGR | 0.57 | 0.73 | 0.63 |
CON | 0.57 | 0.77 | 0.60 |
OPN | 0.70 |
|
|
Mean | 0.652 | 0.64 | 0.574 |
Multilabel learning task is to predict one or more categories for each instance. In the existing algorithms, such as PT5 method proposed by Tsoumakas and Katakis [
As mentioned in Section
Precision of different methods on Facebook dataset. The best performance per personality trait appears boldfaced.
Related works | Methods | EXT | NEU | AGR | CON | OPN | Mean |
---|---|---|---|---|---|---|---|
Our work | CP |
|
|
|
|
|
|
Verhoeven et al. [ |
SVM | 0.79 | 0.71 | 0.67 | 0.72 | 0.87 | 0.752 |
Farnadi et al. [ |
SVM, kNN, NB | 0.58 | 0.54 | 0.50 | 0.55 | 0.60 | 0.554 |
Alam et al. [ |
SVM, BLR, mNB | 0.58 | 0.59 | 0.59 | 0.59 | 0.60 | 0.590 |
Tomlinson et al. [ |
LR | NA | NA | NA | NA | NA | NA |
Recall of different methods on Facebook dataset. The best performance per personality trait appears boldfaced.
Related works | Methods | EXT | NEU | AGR | CON | OPN | Mean |
---|---|---|---|---|---|---|---|
Our work | CP | 0.76 | 0.67 |
|
0.70 | 0.79 | 0.732 |
Verhoeven et al. [ |
SVM |
|
|
0.68 |
|
|
|
Farnadi et al. [ |
SVM, kNN, NB | 0.61 | 0.53 | 0.50 | 0.54 | 0.70 | 0.576 |
Alam et al. [ |
SVM, BLR, mNB | 0.58 | 0.58 | 0.59 | 0.59 | 0.60 | 0.588 |
Tomlinson et al. [ |
LR | NA | NA | NA | NA | NA | NA |
Related works | Methods | EXT | NEU | AGR | CON | OPN | Mean |
---|---|---|---|---|---|---|---|
Our work | CP |
|
|
|
|
0.84 |
|
Verhoeven et al. [ |
SVM | 0.79 | 0.70 | 0.67 | 0.72 |
|
0.748 |
Farnadi et al. [ |
SVM, kNN, NB | 0.56 | 0.52 | 0.50 | 0.54 | 0.61 | 0.546 |
Alam et al. [ |
SVM, BLR, mNB | 0.58 | 0.58 | 0.58 | 0.59 | 0.60 | 0.586 |
Tomlinson et al. [ |
LR | NA | NA | NA | NA | NA | 0.630 |
From Tables
In this paper, we studied the problem of exploiting interdependencies between user’s personality traits for predicting user’s personality traits. First, after analyzing importance of features, we conducted experiments on Facebook dataset to demonstrate the existence of correlations between user’s personality traits. Bearing this in mind, an unsupervised framework, adaptive conditional probability-based predicting model, was then proposed to predict user’s Big Five personality traits based on importance of features, dynamic updated thresholds of personality traits, and prior knowledge about correlations between personality traits. Furthermore, we compared our results with the ones achieved by others in the Workshop on Computational Personality Recognition (Shared Task) on the same dataset. In general, the experimental results demonstrated the effectiveness of our proposed framework.
In future work, we will speculate on what directions can be undertaken to ameliorate its performance with respect to time complexity so as to better apply it to a big data environment as in Facebook monitoring. Besides, in order to make our framework applicable to dynamic networks better, we will explore combining time series analysis with personality traits, predicting algorithm to capture dynamic evolution process of information and network structure. Furthermore, since social network features (described in Section
The authors declare that there is no conflict of interests regarding the publication of this paper.
This work is supported by the National Natural Science Foundation of China under Grant no. 61300148; the Scientific and Technological Break-Through Program of Jilin Province under Grant no. 20130206051GX; the Science and Technology Development Program of Jilin Province under Grant no. 20130522112JH; the Science Foundation for China Postdoctor under Grant no. 2012M510879; and the Basic Scientific Research Foundation for the Interdisciplinary Research and Innovation Project of Jilin University under Grant no. 201103129.