The development and popularity of microblog have made sentiment analysis of tweets and Weibo an important research field. However, the characteristics of microblog message pose challenge for the sentiment analysis and mining. The existing approaches mostly focus on the message content and context information. In this paper, we propose a novel microblog sentiment analysis framework by incorporating the social interactive relationship factor in the content-based approach. By exploring the interactive relationship on social network based on posted messages, we build social interactive model to represent the opposition or acceptation behavior. Based on the interactive relationship model, the sentiment of microblog message with sparse emotion terms can be deduced and identified, and the sentiment uncertainty can be alleviated to some extent. Afterwards, we transform the classification problem into an optimization problem. Experimental results on Weibo data set indicate that the proposed method can outperform the baseline methods.
With the advent of Web 2.0, users become more eager to publish and share their opinions in various domains on social networks, such as Twitter and Weibo. The opinions can reflect people’s sentiments and views across areas as diverse as commercial products, services, and public events, while these opinions can influence the ultimate decisional process related to individual behavior and public policy. For example, by analyzing the sentiment of consumers for a certain brand and released product, it can help companies to improve their marketing campaign, product design, and user experience. After comprehending mass voters’ opinions towards a candidate, it is conducive to making their political strategy better [
However, the microblog message is usually short, noisy, and ambiguous due to its informal expression style. These characteristics present a challenge for the text sentiment analysis. Recently, many works are making great efforts to deal with the microblog sentiment classification problem. Generally speaking, the previous works are mostly based on the content-based approach, such as lexicon-based method and corpus-based method [
The existing research works have proved that the content-based method is effective in tackling microblog messages with strong sentiment expressions. However, sometimes the message does not contain obvious sentiment terms, but there still exists emotional tendency implicitly to a certain extent. For this situation, the pure content-based approach is inadequate to identify the sentiment orientation. In other words, the sparse sentiment terms in microblog messages make it difficult to classify the hidden emotion. Moreover, in the microblog message dissemination process, the sentiment characteristic may not be discriminative to classify its emotion polarity. In view of this situation, some scholars consider incorporating the hidden social relationship, such as friend and follower, into message content to settle the sparse sentiment terms problem [
Meanwhile in practice people may be in disagreement with others in their friend circle as they have different social cognition, knowledge structure, and experience. That is to say, one may oppose or accept the opinion reflected in message posted by someone else. And the opposition or acceptation behavior also reflects the emotion polarity of the participants in the microblog environment. In this paper, this opposition or acceptation behavior is regarded as a social interactive relationship. By leveraging this kind of interactive relationship, it may be feasible to infer the sentiment of message for participants and build the learning model. In this work, we exploit the social relationship between users on microblog platform and incorporate this factor into content-based analysis process. Via exploiting the social interactive relationship among microblog users, a voting-based social interactive model has been built to link the microblog participants based on the message context. Afterwards, the sentiment classification problem can be transformed into an optimization problem integrating the content-based strategy and social interactive relationship factor. Experimental results on microblog message data sets validate the effectiveness and efficiency of our method.
This paper is structured as follows. In Section
In recent years, as an opinion-rich resource, microblog messages provide the possibility of mining and analyzing the emotion of people in different scales. And microblog sentiment orientation identification has gained hung popularity and attracted researchers’ attention across many fields. In this section, an overview of works related to microblog sentiment analysis and classification is presented.
Previous works usually compare the text content of a given message with a lexicon or a dictionary to classify its sentiment and calculate its strength. For instance, SentiWordNet contains as many as 200,000 entries to match each word with positive, negative, or objective scores. Mostafa et al. propose a lexicon-based method to analyzing consumers’ sentiment towards some commercial brands [
As tweets are usually short and more ambiguous, Jiang et al. [
Actually, beside the content-based information, the hidden social relationship in microblog message can also be reflected and may be used in the sentiment analysis process. Recently, there are many works that focus on the social relationship in microblog space and leveraged the relationship to improve the classification accuracy [
The purpose of sentiment orientation identification for microblogs is to build a program which can automatically identify whether a given microblog message is expressing positive, negative, or no sentiment. In other words, this problem can be defined as follows: given a collection of microblog message set
However, the character of microblog message, such as noisy, short, and ambiguous, poses a challenge for sentiment analysis and mining. In addition, the informal language expression and the use of emoticon make the sentiment analysis problem more difficult. If the message to be analyzed contains distinct sentiment vocabulary, it is not hard to classify the sentiment of this message by the means of established semantic lexicon and dictionary. However, when there is no obvious sentiment vocabulary in the message but, with sentiment orientation, it is impossible to obtain correct sentiment estimation, sometimes it is difficult to determine the implicit emotion via the content-based approach. Especially in the process of information dissemination, such as forward, comment, and @ operations, the original microblog message may change its sentiment to some extent and even generate polarity reversal. Let us consider two examples as follows.
Well, Jane’s opinion towards genetically modified food is so. What do you think?
@Jack, I do not agree with your views about genetically modified food.
As described above, in Example
Obviously, microblog message possesses not only explicit text content information but also implicit social interactive relationship due to its social network nature. In microblog website, a message is published by a user to express his/her opinion towards event, public figure, and commercial products, while this message may be forwarded and commented on by other people who are interested in it. Via the social interactive operation, the opinion implied in a message diffuses in virtual cyber space and influences many participants. It is quite clear that the social interactive operation plays a major role in the information dissemination and can reflect the emotion among the participants. Moreover, in the process of microblog message diffusion, the social interactive operation may enhance or weaken or even reverse the emotion hidden in the original message. In other words, social interaction is a key issue for dissemination and evolution of microblog message.
In practice, some people usually hold an objective and comprehensive opinion in their published messages, while some people may keep a subjective or extreme opinion towards the evaluated objects, such as events or other people. As a matter of common sense, the posted messages of the former microblog users should obtain more support or acceptance from other people. And the latter ones will receive criticisms for their one-sided opinions. The support or opposition can be seen as a vote from other people to the publisher of certain message. If the pattern of social interactive relationship can be obtained, it is possible to infer and predict the sentiment of microblog messages. Based on the observation, we consider leveraging this kind of social interactive relationship in the process of microblog sentiment classification.
Let us demonstrate the principle by a toy example in Figure
A toy example for social interactive relationship.
From the social interactive relationship depicted in Figure
More specifically, element
In the following, we should extract the interactive relation from posted messages in data set. Given a microblog message data set,
An example of microblog message matrix representation.
Next we can define the problem of sentiment orientation identification in this work. Consider a microblog message data set
Inspired by the work [
The elements in matrix
In this paper, we employ the
Moreover, it is necessary to integrate the aforementioned social interactive relationship factor into the learning process depicted in formula (
So, the sentiment classification can be solved by the synthetical formula:
Each experiment is repeated ten times independently and the average results are reported. The experiments are conducted on microblog message data set which is crawled from Sina Weibo. The data set consists of 3.7 million messages. The corresponding social network is also included in the data set. Firstly, we conduct experiments to verify the performance of our proposed SSTI (Sparse Sentiment Terms Identification) approach by comparing it with existing common sentiment classification approaches including LS (Least Squares) and SVM (Support Vector Machine). The experimental results are shown in Figure
Sentiment identification on microblog messages.
We also investigate parameter
The influence of parameter
Next, we will discuss the guidelines for how to choose an appropriate value of parameter
Finally, we explore the runtime efficiency of our proposed approach. The experiments are implemented with Interl(R) Core i3-3110M CPU, 4.00 GB RAM, in Matlab R2010b environment. The runtime efficiency results are shown in Figure
The runtime efficiency of our approach.
Sentiment classification of microblog message is an important research area. Through classification and analysis of sentiments on microblog, one can get an understanding of people’s attitudes about particular topics. However, sometimes there are not enough emotion terms in the messages to be analyzed. The sparse sentiment terms in microblog message pose a challenge to the content-based sentiment classification methods. Towards this problem, we propose a novel notion of social interactive relationship based on microblog messages and propose incorporating it in the existing content-based approach. After modeling the social interactive relationship matrix from user-message matrix, we build a sentiment identification learned model to analyze the emotion of a given message. Experiments demonstrate that our proposed approach can improve the sentiment classification performance significantly.
The authors declare that there is no conflict of interests regarding the publication of this paper.
This work is supported by National Natural Science Foundation of China (Grant nos. 61402360 and 61401015).