Application of the Deep Pretrained Language Model Processing Method in Social Network Sentiment Analysis

In social network, users can manage their social network and social identity, publish information on various topics, and obtain information published by other users through friend relationship. e resulting large amount of text data attract more and more scholars to study it. Text sentiment analysis has become a hot spot in social network data analysis and has important application value in academic eld, social eld, and business eld. Based on the idea of pre-training, this paper improves the random word masking algorithm of deep pretraining task in the BERT (Bidirectional Encoder Representation) model to improve the eciency and stability of model pretraining. Second, a new pretraining task of original sentence judgment is designed to enable the model to measure the degree of sentence smoothness, so that the BERTmodel can better understand the semantics of context. By referring to the idea of attention mechanism, a deep learning framework with attention weight added into gated convolution is constructed and the special attention weight method is adopted to enhance semantic information. Second, gating convolution and attention mechanism are combined to model aspect-related semantic information and text complete semantic information. Finally, classify the emotion classier layer of social network, use Softmax function to complete negative, positive, and neutral multiple classications and calculate the result of emotion classication. By applying the optimized convolutional neural network cyclic optimization network to single task and multitask in practice, the feasibility of applying the optimized convolutional neural network and cyclic neural network to social network sentiment analysis is veried.


Introduction
Sentiment analysis of social networks, also known as opinion mining, uses natural language processing technology to mine users' views, attitudes, and emotions in social networks [1]. Many researchers have conducted sentiment analysis on the contents of social networking sites [2] and have mined the emotions and potential opinions of users on social networking sites. At present, this research has broad application scenarios. For example, enterprises can accurately obtain customer feedback information on products, further improve product quality according to the feedback information, and develop more e cient product promotion plans. rough public opinion supervision, the government can make quick response to public events and make use of social relationship data on social networking sites to conduct emotional analysis so as to achieve emotional counseling for Internet users. For content and form, text sentiment analysis obtained more satisfactory results, but in a social network in this paper, in the essay, the syntax is irregular, rich, and contains more data noise of language expression, exacerbated by sparse problem of vocabulary, so the traditional text sentiment analysis methods used by emotion in the social network analysis method showed a sharp drop in in the sentiment analysis. However, social networks also contain a lot of other valuable information, such as the social relationship information generated by the interaction of users [3] and various forms of expression contained in microblog content. How to make use of the characteristics of social networks to conduct emotion analysis has become a topic worthy of study.
Social network sentiment analysis is especially important to detect early on social media before it spreads widely, so that action can be taken earlier. Because sentiment analysis on social networks is more and more widespread, the more likely people are to believe it, the greater the damage. rough the analysis of the above studies, this paper finds the following problems in the detection of online social network sentiment analysis. First, the directivity of social comment data is unclear in combination with the characteristics of the original topic data and social comment data. Second, when constructing features, the input sequence is too long and the model cannot receive it completely, resulting in incomplete actual features. ird, online social network sentiment analysis and detection are strongly related to the real-time performance of news. If the data of the training model are far from the current data to be discriminated, the accuracy of model discrimination will be greatly reduced. How to correctly use the pretraining model is the key to solve this problem.
Sentiment analysis in social networks has important applications.
is part applies social network sentiment analysis to the field of emergency discovery in microblog data flow. How to detect emergencies from micro blog data streams is a hot research issue in natural language processing and social network analysis. However, the existing methods using text clustering and microblog tags have problems of low accuracy and efficiency. In view of the above deficiencies, an online microblog emergency discovery model based on emotion co-occurrence map and tag extraction is proposed. Traditional sentiment analysis usually divides emotions into two or three categories. is part uses the emotion wheel widely recognized in the field of psychology to define emotion types and build a new multicategory sentiment dictionary emotion cograph offline. e unsupervised learning classification of micro blog emotion is carried out by using the emotion co-occurrence map, and then the classification results are applied to the field of emergency discovery to detect the emergent state of micro blog flow. rough the detected emergency status, the tags in the emergency period are extracted, and the split tags are used as keywords to find emergencies. e words with high correlation with tag keywords in Weibo are extracted to describe emergencies together with tags.

Related Work
With the progress of network technology, a large number of new websites follow. Most of these sites are user-centric, and users can post whatever information they want. Among them, online social networks are the most popular, such as Weibo, Facebook and Twitter. Sentiment analysis has also moved from news, movie reviews, and long blogs to analysis of content on online social networks. Experiments verify that contents in online social networks can truly reflect people's attitude changes in real life [4], that is, people's emotions on online social networks are consistent with real life. Due to its great academic and economic value, sentiment analysis of online social networks has become a hot research issue in recent years.
It pioneered sentiment analysis on social networks by using emoticons on Twitter to mark micro blogs [5] and then adopting methods to classify micro blogs. en came more precise methods. For example, a method to disambiguate adjectives using the Bayesian classifier is proposed [6]. It proposes a method for detecting sarcasm in Twitter and Amazon reviews. In this method, the patterns of sarcastic statements are defined, punctuation marks are added as features, and the KNN algorithm is used to classify them. e sentence is represented as a core tree, and several newly defined features are added on the basis of the unit group. e proposed method calculates the similarity of the subtree [7] and then conducts sentiment analysis. e dataset used in this method is manually annotated Twitter data. Using N-tuples, dictionary features, and POS features, Adaboost is used as a classifier to train texts [8]. A micro blog sentiment analysis method combining manual selection features, different sentiment dictionary features, and traditional features is proposed [9]. Emoticons and labels were used as emotional markers to classify micro blogs, and four different features of micro blogs were extracted-punctuation marks, words, N-tuples, and expression patterns-and then K-nearest Neighbor (KNN) was used as a classifier [10]. A goal-level emotion analysis method based on goal dependence was proposed. ey represented the target as a continuously distributed vector, which was input together with the word vector in the text as features into a neural network model of long and short-term memory with attention mechanism [11]. Generalized information, repeated punctuation, and repeated letters are extracted from microblogs, and a co-occurrence map is constructed through the tag propagation algorithm to classify micro blogs emotionally [12]. An emotion dictionary is established based on the relationship between words and emoticons, then the emotional features are extracted from the dictionary, and then combined with other features to complete the emotion analysis of Weibo [13]. Before the emotional analysis of microblogs, the established subjective word dictionary is used to classify microblogs subjectively and objectively, and then to judge the emotional orientation of microblogs [14]. is paper uses two features: meta-information and grammatical information of microblog. Subjectivity and polarity come from the dictionary. In order to ensure the integrity of features, they added popular words in the network to the dictionary and used SVM to train data [15]. Using the text context features of micro blog for emotion analysis, they used micro blog in the same conversation as the text context and established hierarchical LSTM (Long Short-Term Memory) networks with two attention mechanisms to analyze micro blog emotion. e method in the above literature only uses the text information of micro blogs to classify micro blogs without considering the influence of information other than text on sentiment analysis.
For automated testing research, the analysis of the social network emotion in most early studies use traditional machine learning algorithms, such as studies on the relevant features of the original users as well as topics readers [16], and the analysis of the published studies have shown that social network emotion user number of daily speech is less than that of ordinary users, e researcher trained five relevant classification models, and the experimental results show that the accuracy of the discrimination of the social network sentiment analysis detection model based on user characteristics has been improved. e online social network sentiment analysis and detection model based on user characteristics can improve the accuracy of detection to a certain extent, but at the same time, there are serious problems because normal users will spread the social network sentiment analysis without knowing it, so the misjudgment rate of this method is relatively high. Cue words in the text are constructed and analyzed in an attempt to predict early social network sentiment analysis [17]. For the health-related social network sentiment analysis, the characteristics of information sources of samples were analyzed, such as the URL and site name of the information source website, and then the logistic regression model was used to test the online social network sentiment analysis. Network features in comments are integrated [18], and network features are added on the basis of traditional features, resulting a great improvement in results. A social graph is proposed to simulate the interaction between users and identify influential communicators of social network sentiment analysis [19]. Heterogeneous networks have various types of nodes or edges, such as the news creators, social network sentiment analysis, and three-layer relationship network, among users. It uses entity embedding and relationship modeling to build a hybrid social network sentiment analysis and detection framework [20]. ese studies show that the spread of social network emotion analysis network structure on social network analysis of emotion plays an important role, but this approach requires more complete analysis of the social network emotion transmission network structure and the complexity of the model is also higher, for the social network plays the role of less than a lot of sentiment analyses in early detection. e paper constructs the features of four aspects, namely, text, user, communication, and topic, and then uses the decision tree algorithm to detect the sentiment analysis of social networks [21]. e proposed method of social network sentiment analysis and detection mainly focuses on time-related features and establishes a random forest classifier to detect social network sentiment analysis.
is study shows that social network sentiment analysis may be greatly affected by time.
e propagation path classifier with convolutional neural network is used for sentiment analysis and detection in early social networks. Experimental results show that the propagation path model has better performance than the feature-based algorithm to a certain extent. However, in the early stage of communication of social network sentiment analysis, there is usually no much information to spread, so it is difficult to apply it in the actual scene [22]. e kernel of propagation tree is proposed.
is study focuses on the propagation structure, but this method ignores the key time characteristics. erefore, another social network emotion analysis and detection model based on recursive neural network is proposed [23]. e vector of hidden layer in the model can capture the change of context information of relevant topic data over time. Experimental results show that recursive neural network performs well in social network emotion analysis and detection task and is more accurate in recognition. Similarly, a proposed approach based on a neural network layer of short and long term memory and several layers of densely modified linear elements is proposed. Recursive neural network is good at processing data features related to time series, but when processing long sequence data, gradient disappearance will occur, which is not conducive to model training. However, many data in the emotion analysis and detection task of social network are relatively in long sequences. Further studies combined the authenticity of user position and topic for multitask learning, where each task has a specific gated loop unit layer, and tasks also share a GRU layer [24]. A neural network model based on tree recursion is proposed, which can learn the important features in the network structure of emotion analysis and transmission in social networks [25]. Studies show that early text information and picture information are particularly important.
In general, emotion classification based on the aspect level is more targeted and valuable, so it has attracted extensive attention of researchers. In addition, it has achieved good results in document-level analysis, but it is more challenging to improve the algorithm. At present, the research on aspect-level sentiment analysis with the practical application value is still in its initial stage. In aspect sentiment analysis, entity information is more important, and there may be different emotions for different objects in a sentence. Currently, the urgent problem of the aspect sentiment analysis task is to accurately identify the emotion classification of the target text.

Task Definition of the Deep Pretraining Language Model.
It is defined as a sentence of n words containing aspect words. e purpose of aspect sentiment analysis is to determine emotional tendencies based on specific aspect words in a sentence. In low-dimensional space, word embedding can give a more comprehensive understanding of the meaning contained in each word than one-hot vector. In this paper, the words in the dataset are mapped into continuous dense low-dimensional vectors, and all the word vectors are superimposed into the word embedding matrix, where V represents the size of the matrix and D represents the dimension of the word embedding. e framework diagram of social network sentiment analysis is shown in Figure 1. Input the vector calculated by the statement attention layer into the classifier for calculation, and use softmax function to calculate the probability of emotion category, and then the process is as follows: (1) Social network sentiment analysis aims to extract emotional information from information posted on social media sites. Sentiment analysis of social networks integrating social relations mainly utilizes social relations among users to build connections between published contents and expands data to more accurately analyze emotions.

Mathematical Problems in Engineering
Compared with traditional sentiment analysis, sentiment analysis on social networks is more challenging due to the limitation of content popularity, the randomness of sentence content, and the mutual influence of emotions and behaviors among users. e accuracy of sentiment analysis is affected by the flexible form of short text and expression of microblog content. For example, when reviewing a product, if there are very few comments from users, it can hinder sentiment analysis. In order to solve the problem of data sparsity, in recent years, many researchers have used sociological theories to expand data. In addition to published content information, social network platforms also include users' personal information and interaction behaviors. Active interaction behaviors include following other users, posting microblog content, liking, commenting, and forwarding other microblog content. Passive interactions include being followed by other users, and Weibo content being liked, commented on, and forwarded. rough personal information and interactive activities, the relationship information between users, between users and micro blog, and between micro blog itself is established, which plays a significant role in the field of social network sentiment analysis.

Fusion Classification Model of Social Emotion Analysis
Based on BERT and LSTM. After some routine preconditioning, BERT was used for training. After the training, characters are used in the input layer. In this chapter, various LSTM classifiers are used for experiments, hence the name be-LSTM. Finally, the output of emotion classification is obtained through the full connection layer. e results are presented in the form of two types of tags.
e data flow of the emotion analysis task is shown in Figure 2.
e embedding training of BERT combination words will input sentence by sentence. Word embedding is realized through coding and BERT, and the optimized word vector matrix is obtained after training.
e embedding results are used as the initialization matrix of the embedding layer, namely, the embedding layer in Figure 2, and then as the input of the LSTM layer, which enter the neural network for emotion classification training. e output is then passed through the dropout layer, the dense layer, and the activation function. Dropout is designed to prevent data from overfitting.
According to the analysis of the number of comments in the Twitter dataset and the micro blog dataset, the number of comments in the micro blog dataset is significantly higher than that in the Twitter dataset. On the one hand, this indicates that Weibo has a large number of users on social networking platforms, and users are actively involved in hot topics. On the other hand, too many comments also mean that there may be malicious users who brush comments on social platforms, resulting in abnormal number of comments. Moreover, there may be more than one malicious user who brushes comments, so it is necessary to deal with outliers. For such abnormal data, this paper filters out most comments in the record and then keeps the number of comments in the record as the upper quartile of the total number of comments. e filtering method is carried out through random sampling.
For excessively long input samples, it is impossible for the model to take a single topic data including all comment texts under it as input, and the dimension represented by the vector is fixed. e intercept method is put forward and therefore is based on the input samples before capture, after clipping, and the middle way of feature extraction, and according to the data sequence length, filter out the outliers in the lower quartile as maximum sequence length of model training input sequence, and then save it as a fixed length input model for training. For the input sample exceeding the maximum sequence length, the above method is used to extract. On the basis of three kinds of sequence interception, different organization methods will have different influence on the representation of text. One way is to splicing the comments of each topic data first and then extract them. is way is complete and continuous for the remaining sequence part, but it also loses a lot of context information, which is not conducive to the model's understanding of context.  Another way is according to the topic of all the social commentary in the data sequence length, and then set the maximum sequence length, calculate the certain step length, then the sequence of the text is divided into several parts, andfinally through the step to extract text sequence, although this way for some comments information may be discontinuous, but it can improve the ability of model to understand the context.

Deep Pretraining Data Filtering Strategy for Social Comments Based on the Sentiment Analysis Model
In addition to the sequence truncation scheme, a sentiment analysis model is proposed to analyze the sentiment of each comment data and filter the comment data. Finally, a comparative experiment is carried out to get a better filtering strategy. Before filtering the social comment data, the sentiment analysis of the comment data of each published topic is made first, and the emotional strength of the comments of each data is counted separately. e emotion analysis model calculates an emotion score, which is then used to rank emotions into three categories: positive, negative, and neutral.
A natural language processing model CoreNLP (collection of tools for working with natural languages) was used for sentiment analysis of social comment data recorded for each topic in the Twitter English dataset. When this model conducts sentiment analysis, it will first divide a paragraph into multiple sentences and then take sentences as the unit of classification, so the result is the classification result of each sentence, which is divided into three categories: positive, negative, and neutral, while there may be multiple sentences in a comment. In this case, it is necessary to design the corresponding algorithm for sentiment analysis of social comments, as shown in Table 1.
In the sentence of social comment, if the number of sentiment analysis categories except the positive category is 0 in the statistical result of sentiment analysis, then the emotional polarity of the comment can be directly judged as positive. Similarly, negative and neutral emotions are also treated in the same way. When the number of negative sentences is 0, if the number of neutral sentences is more than twice the number of positive sentences, the emotional polarity of the comment is judged as neutral. Here, the weight of the positive emotion category is set to be larger. When comparing the positive and negative emotion categories, the weight of the negative emotion category is set to be larger. When the number of sentences of the positive emotion category is more than twice the number of sentences of the negative emotion category, then the emotion category of the comment is identified as positive.
As for the social comment data of hot topics in the Chinese dataset of Weibo, several experiments show that the SentA sentiment analysis model developed by Baidu is better than the CoreNLP model developed by Stanford University with high accuracy, so the SentA model is used to analyze the Chinese dataset. is model can only discriminate two types of sentences, positive or negative, but it can discriminate multiple sentences at once. erefore, the design of sentiment analysis algorithm for social comments of micro blog Chinese dataset here is different from that of the natural   Table 2. Figure 3 shows the statistical analysis of emotional polarity of social comments on the original topic in the English Twitter dataset and the Chinese Weibo dataset, respectively. As can be seen from the figure, negative emotional comments take up a relatively large proportion in the Twitter dataset, while positive emotional comments take up a relatively large proportion in the Weibo dataset. e framework of the micro blog sentiment analysis method based on weak dependency relationship is shown in Figure 4. In this model, the user-user relationship and usermicro blog relationship are first used to construct the direct relationship diagram of micro blog. Second, the community discovery algorithm is applied to the direct relation graph of micro blogs to obtain the weak dependence relationship between micro blogs. en, this chapter combines the user context, user relationship context, and weak dependency context to obtain the social context matrix A, as shown in Figure 4. Finally, a micro blog sentiment analysis model is established, which can further consider the influence of social context matrix A on micro blog sentiment analysis on the basis of taking micro blog information (feature matrix X) as the model feature. e emotion analysis model based on BERT and improved hierarchical attention mechanism. e attention of entity layer is focused on the important words to attribute words, and then the vector is refined into contextual semantic information to help make the update of matching attribute words, aspects, and emotions on the vector. en, the vector is put into the attention mechanism of the sentence layer, and the feature vector of task-related attribute words is modeled. e rational use and combination of attention can make the model focus on task-related information rather than unrelated words in the sequence with attribute words. e introduction of BERT encodes contextrelated information and extracts deeper semantic features of the text. e combination of BERT and improved hierarchical attention has a significant effect.

Example Verification
To verify the validity of the proposed deep pretraining language model, experiments were performed on two real Twitter datasets, HCR and OMD. e result is shown in Figure 5.
From the experimental results in Figure 5, the following conclusions can be drawn: using any of the three social contexts can improve the accuracy of micro blog sentiment analysis on the two datasets. All uses of social context emotional analysis of accuracy of the method are higher than that of only using text sentiment analysis method of accuracy, which further verifies the user context weak dependence relationship between context and Weibo emotion label positive relationship from a certain extent, and also verified the validity and rationality of assumption in this chapter. Micro blogs connected by social context are more likely to have similar emotions. Homogeneity of nodes does exist in online social networks, which is why different social contexts can improve the effect of sentiment analysis. In addition, the experimental results can also serve as the experimental basis for the homogeneity phenomenon in social networks.
In order to further explore the reasons why different weak dependency extraction methods have different performances in micro blog sentiment analysis, this section analyzes the statistical information of community discovery results (as shown in Table 3). Since there are no actual community discovery results of the dataset, it is difficult to find a reasonable standard to manually label such large-scale real network dataset, and modularity is used as a standard to measure the performance of the community discovery algorithm. As shown in Table 3, for OMD dataset and HCR dataset, after community discovery by Louvain, the partition result obtained has the least number of communities and the Table 1: Deeply pretrained sentiment analysis of social comments on twitter hot topics.
Enter: Twitter trending topics for each social comment Output: sentiment category statistics of social comments (1) Active_cou f � 0, passive_cou f � 0, neutralityity_cou f � 0 (2) For each data; in hot topics data_list do (3) For each comment; in data. Comments do (4) Res * nlp. Annotate (comment) (5) Active f � 0, passive � 0, neutrality � 0 (6) For each sentence; in res ["sentences"] do (7) e sentiment classification of each sentence in the social comment data was calculated (8) Count sentiment (sentencei, active, passive, neutrality) (9) If (passive is 0) and (neutrality>2 * active) then (10) Neutrality_ count f-neutrality_ count + 1 (11) If (passive is 0) and (neutrality <2 * active) then (12) Hot topics positve_count active_cou + 1 (13) If active >2 * passive then (14) Active_cou active_cou + 1 (15) Else (16) Passive_cou f-passive_cou + 1 (17) Return active_cou, passive_cou, neutrality_count largest modularity, which may be the reason why it achieves the best result in micro blog sentiment analysis. Combined with Figure 5 and Table 3, it can be found that both the number of communities and the degree of modules will affect the accuracy of micro blog sentiment analysis. e effects of word2vec and BERT as word embedding on classification were compared, respectively. In addition, a group of SVM was added as the benchmark control group. As shown in Table 4, in the experimental results of Twitter dataset, the average accuracy of SVM is only 49.91%, which is far less than that of neural network.
e social network situation analysis learning curve of its deep pretraining language model is shown in Figure 6. Figure 7 shows the loss curve and accuracy curve drawn according to the model training log after data optimization.
As can be seen from the experimental results, for the Twitter dataset, the model begins to converge when the number of training iterations exceeds 30K or so. For the micro blog dataset, the model begins to converge when the number of training iterations is around 25K, and it also indicates that after the optimization of text features by using the new method, the model achieves relatively high accuracy in discriminant detection of social network sentiment analysis. e distribution of micro blog quantity in the dataset is shown in Figure 8. Figure 8 shows statistics of the number of micro blog releases within 70 days in hours. As can be seen from Figure 8, people's behavior of releasing news through micro blog basically conforms to the rule of human work and rest.
is chapter makes a statistical analysis of micro blog containing emotion, micro-blog containing emotion symbols, and micro blog containing labels. e analysis results are shown in Figure 9. As can be seen from the figure, there   are more micro blogs containing emotion than those containing emotion symbols, accounting for about 42% of the total number of micro blogs, while those containing emotion symbols only account for 31% of the total number of micro blogs. is reflects the effectiveness of emotion as a feature item in emergency discovery. e number of micro blogs containing tags only accounts for 13% of the total number of micro blogs, indicating that writing tags is not a habit of Sina Weibo users. erefore, in order to improve the accuracy of emergency discovery, the algorithm based on tags is improved. At the same time, it can be found from the figure that around the 20th day, the number and proportion of labels decrease while the emotional symbols and emotions increase.

Aspect Term Extraction
Aspect Sentiment Classification

Conclusion
Based on the traditional CNN and LSTM network models, CNN inputs the vector of the gated convolution layer into the emotion classifier layer for classification by referring to the gated elements of LSTM. A multilayer network with a certain number of neurons is constructed, and CNN and LSTM with the special processing mode between layers are used to complete feature extraction. Each model is applied to the multiclassification problem of social network emotion analysis, and the performance difference of each model is analyzed through the characteristics and recognition ability of these different predeep training language models and recurrent neural network learning. After the text information is obtained, contextual key phrases, sequence information, and semantic dependencies are encoded. en algorithms are used to filter background noise and obtain more accurate samples so that the machine's understanding is closer to that of humans. Among them, steps such as convolution and pooling are to obtain emotional features from the text and help the machine to understand the real meaning of the text. An entity aspect emotion analysis model based on BERT and improved hierarchical attention mechanism is proposed. BERT is used as an embedding layer to refine context-related aspect vectors on the existing model, and attribute words are matched with aspect and emotion effectively to obtain the emotional polarity of attribute words and aspect pairs.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.