Transaction Prediction in Blockchain: A Negative Link Prediction Algorithm Based on the Sentiment Analysis and Balance Theory

User relationship prediction in the transaction of Blockchain is to predict whether a transaction will occur between two users in the future, which can be abstracted into the link prediction problem. The link prediction can be categorized into the positive one and the negative one. However, the existing negative link prediction algorithms mainly consider the number of negative user interactions and lack the full use of emotion characteristics in user interactions. To solve this problem, this paper proposes a negative link prediction algorithm based on the sentiment analysis and balance theory. Firstly, the user interaction matrix is constructed based on calculating the intensity of emotion polarity for social network texts, and a reliability weight matrix (noted as RW-matrix) is constructed based on the user interaction matrix to measure the reliability of negative links. Secondly, with the RW-matrix, a negative link prediction algorithm is proposed based on the structural balance theory by constructing negative link sample sets and extracting sample features. To evaluate the performance of the negative link prediction algorithm proposed, the variable management method is used to analyze the influence of negative sample control error and other parameters on the accuracy of it. Compared with the existing prediction benchmark algorithms, the experimental results demonstrate that the proposed negative link prediction algorithm can improve the accuracy of prediction significantly and deliver good performances.


Introduction
Everything in our lives has been digitized with the network technologies including the wireless mesh network with the topology preservation [1] and the wireless sensor networks using the Markov random field [2], both of which assist in providing the last mile Internet access for users. Moreover, with the development of the Blockchain, the technology behind the Bitcoin cryptocurrency system [3], many kinds of cryptocurrencies have been used in digital transactions. The link prediction algorithms can be used to predict whether a cryptocurrency transaction relationship will occur between the users. Tanevski et al. [4] use the Bitcoin OTC network to predict whenever a possible transaction will be made between two users in the network. In the transactions of Blockchain, the node represents a user, and the link represents the trust degree or the ratings or the assessments of the users given to each other after they made a transaction; the value of the link can be either numbers or texts. Thus, the link prediction algorithm, which is based on the information of user relations and user attributes in various networks, will provide more information and support for the decisionmaking of users who make transactions based on Blockchain technology.
Link prediction is mainly divided into the user similarity matrix [5] and the machine learning-based method [6]. In the user similarity matrix, the value represents the similarity between two nodes, and when the value is greater, the possibility of the existence of links between nodes is greater. The machine learning-based method is to create a model with a set of adjustable parameters. The optimal parameter value is found by the optimization strategy, so the obtained model can reproduce the real network structures and relationship characteristics better. In addition, Yuan and Pradeep found that adding emotional features can improve the accuracy of link prediction [7]. According to the user's emotional features extracted from different topics, the more likely the two users have common emotional tendencies, the more likely they are to be friends.
However, most platforms prefer to exhibit the positive sentiments and conceal the negative ones. For example, the users can express their positive sentiments with the thumbs-up icon or other functions directly. In contrast, if users want to express negative sentiments, they can only leave messages in the comments section. Hence, most social platforms ignore the negative data; the importance of negative link prediction is underestimated and underutilized [8]. Fortunately, researchers found that the negative link is a good complement to the positive link [9]. The negative link prediction can use the positive link and the interactions of users in the network to predict the possible negative relationships among users, where the interaction among users includes the number of interactions, interaction tendency, and interaction intensity [10]. The existing negative link prediction algorithms mainly focus on the number of negative interactions of users and lack the full utilization of the emotional features in the interaction of users which can improve the accuracy of link prediction [11].
Therefore, this paper proposes a negative link prediction algorithm based on the sentiment analysis. Firstly, with the combination of the sentiment analysis method and the social network, this paper proposes a method to calculate the intensity of the emotional polarity for social network texts. On this basis, we propose a method to construct the user interaction relationship matrix. Based on the user interaction relationship matrix, we construct a reliability weight matrix (noted as RW-matrix) for measuring the reliability of negative links. Secondly, we construct a negative link sample set based on the interactions of users and extract the sample features. Then, the paper proposes a negative link prediction algorithm with the structural balance theory (noted as SABT-NLP).
The remainder of this paper is organized as follows. Section 2 discusses the related work. Section 3 introduces the preliminaries of negative link prediction, and Section 4 describes the details of the proposed methodology. Section 5 presents the experimental evaluation, and finally, Section 6 gives the conclusion.

Related Work
Link prediction can use the existing network topology and other information contained in the network to predict the possibility of future connections in the network which have not yet been connected in the network. Symbol networks are one of the most representative methods used in link prediction [12,13]. Symbol networks are networks with positive and negative signs. For example, YouTube allows users to utilize some of the features to express their opinions about whether they like a video or not. Epinions allows users to rate other users' contents. Such functions in social networks contribute to the development of the symbol network.
Liben-Nowell and Kleinberg [14] proposed a link prediction model and pointed out that the link prediction mainly depends on the similarity between nodes. The more similar the two nodes are, the more likely that a link exists, which can be determined based on the common neighbor method and common path method. However, such methodology is not so suitable for large-scale social networks because of the high costs caused by computing feature values. Song et al. [15] proposed a matrix decomposition method to solve the problem of node similarity in online social networks, which can be applied to large-scale social networks. However, it raises a trick problem; i.e., online social networks do not completely correspond to offline social relationships. Then, Fire et al. [16] proposed a prediction algorithm for the cases lacking some offline friends. Such proposed algorithm can adapt to large-scale social networks as well. Moreover, this methodology can help online social network users to explicitly find people who either know each other or have similar interests with them.
In addition to the analysis of the node similarity to help the link prediction, the attributes like vertices and edges can also be extracted in different scenarios to improve the prediction performance. Benchettara et al. [17] proposed a link prediction algorithm for bipartite social networks based on the extracted attributes of vertices and edges.
Link prediction includes not only the positive one but also the negative one. Leskovec et al. [18] pointed out that the information contained in the negative link can effectively improve the prediction accuracy. Kunegis et al. [19] also confirmed that the negative link prediction can have added value to the social network analysis. Nevertheless, the existing negative link prediction algorithms mainly consider the number of negative interactions among users [18,19], lacking the full utilization of the sentiment features in the interaction between users. Yuan and Pradeep [7] pointed out that adding emotional features can improve the accuracy of link prediction. Therefore, our paper mainly focuses on how to combine the user interaction information and sentiment analysis to solve the negative prediction problem.

Preliminaries
To better understand how to utilize the user interaction information and the positive link to predict the negative link, the basic definition is as follows.
First, a relation networkg p is given that contains the positive link,Ais the user content relation matrix,Ois the user opinion relation matrix, and then a predictorf is generated byg p ,A, andO, which can predict the negative relation network. Before illustrating our proposed method about the negative link prediction, the meanings of symbols which will be used are shown in Table 1.

Sentiment Polarity Intensity Quantification.
In social networks, texts have the following characteristics: huge in volume, short in length, disorganized in words, and freely expressed in grammar. To perform sentiment analysis and quantification of network texts, we design a method for the short text of the social networks by using polarity intensity quantification. The intensity of the text is quantified based on the polarity intensity of each sentiment word.

Wireless Communications and Mobile Computing
The sentiment words can be divided into basic sentiment words and compound sentiment words. The polarity of the basic sentiment words is based on the SentiWordNet annotation of the sentiment dictionary. The polarity intensity calculation of the compound sentiment words is complicated, which can be determined by the following semantic rules: (1) Degree Modifiers+Basic Sentiment Words. Degree modifiers give different weights such as ð0:5, 0:7, 0:9, 1:1, 1:3, 1:5 Þ depending on the intensity of action. For example, the weight 1.5 represents the sentiment intensity of "super," the weight 1.3 represents the sentiment intensity of "very," the weight 1.1 represents the sentiment intensity of "a little," and the weight 0.5 represents the sentiment intensity of "little." The polarity intensity of this type of compound words is the product of the intensity of the degree modifier and the intensity of the basic sentiment word. If the product exceeds the interval ½−1, 1, its boundary is used as the polarity of the compound word.
(2) Repeated Degree Modifiers. For example, the sentiment intensity of "really really like" is stronger than that of "really like." The weight of two "really" needs to be multiplied on the basis of the "like" weight. If the product exceeds the interval ½−1, 1, its boundary is used as the polarity of the compound words.
(3) Negatives+Basic Sentiment Words. Such compound words only need to reverse the polarity of the original emotional words. For example, the sentiment intensity of "not good" is the reverse of the intensity of "good." (4) Negatives, Degree Modifiers, and Basic Sentiment Words Occur Continuously. The combination of such compound words is complicated. Moreover, the different order of appearance of the negatives and degree modifiers will produce the opposite sentiment tendency. If the degree modifiers appear in the middle between the negative and the basic sen-timent word, the polarity is the same as that of the basic sentiment word, and the intensity is equal to the negation of the degree modifier. However, if the degree modifier appears before the negative and the basic sentiment word, the polarity of the compound word should be inverted on the polarity of the basic sentiment word, and the intensity should be multiplied by the weight of the degree modifier based on the basic sentiment word. If the product exceeds the interval ½−1, 1, the boundary is used as the polarity intensity of the compound word.
(5) Emoticons. Emoticons are one of the popular paradigms for users to express their emotional tendency with graphic animations. In order to express richer emotions, we add emoticons to the sentiment dictionary. Based on the above semantic rules and the sentiment dictionary, we can compute the polarity and intensity of each sentiment word, which is shown as Equation (1) below: where SentiðtÞ represents the sentiment polarity and intensity of the text, t is the text to be quantified, and a 1 , a 2 , a 3 , ⋯, a n is a sequence of all emotional words in the text t after sorted from large to small with the intensity. If SentiðtÞ is positive, the larger the SentiðtÞ is, and the stronger the positive sentiment of the text t is. However, if SentiðtÞ is negative, the smaller the SentiðtÞ is, and the stronger the negative sentiment of the text t is.

User Interaction Matrix Construction.
The user interaction matrix is a comprehensive description to fully express the relationship between users. The interaction between users mainly includes point of praise, point stepping, forwarding,  (4) We set E = A × S T , which matches the intensity of user opinions with the users to represent the emotional intensity of user interaction. We calculate NE = A × ððS − jSjÞ/2Þ T to represent the negative emotional intensity of user interaction where N ij represents the interaction intensity between the users u i and u j . We can see that the more interactive times between users, the more intense the user interaction. The denominator of this formula is the max value of user emotional interaction to all other users, which is used to normalize the formula where NN ij represents the negative interaction intensity between users u i and u j 3.3. Reliability Weight Matrix Construction. The U = fu 1 , u 2 , ⋯, u m g represents the set of users in the network, where m represents the number of users. A symbol network can be divided into a positive network subgraph g p ðU, E p Þ and a negative one g n ðU, E n Þ, where E p and E n represent the pair of users with a positive link and a negative one, respectively. E o indicates the pair of users without the links, and the negative link prediction needs to construct the negative link sample from the unlabeled E n ∪ E o . P = fp 1 , p 2 , ⋯, p M g represents a collection of content published by users, and M represents the number of content.
Based on the related user interaction matrix constructed in Section 3.2, the reliability weight matrix W is defined as Equation (2) below: 3.4. Structural Balance Theory Equation.
The basis of the symbol network is the structural balance theory. The balance theory examines the relationship of a triple, which considers that only "the friend's friend is my friend, the enemy's enemy is my friend" is a balanced relationship, and the other is unbalanced. Only the balanced relationship is stable, and the unbalanced relationship has the tendency to transform into a balanced relationship. The triple of the balance theory model is represented as a triangle with three edges, where the plus sign (+) is used to represent the positive relation on an edge, and the minus sign (-) is used to represent the negative one on an edge. The balance of the triangle structure can be determined by the product of three edges. If the product is positive, the structure is balanced. If the product is negative, the structure is unbalanced.
For example, suppose that s ij indicates the relationship between the users u i and u j , which can be considered an edge in a triangle. If s ij = 1, it indicates that there is a positive relationship between u i and u j . In contrast, if s ij = −1, it indicates that there is a negative relationship between u i and u j . For example, the triple hu i , u j , u k i could be balanced when: s ij = 1, s jk = 1, and s ik = 1 or s ij = −1, s jk = −1, and s ik = 1.
For the negative candidate users, we utilize a triple, hu i , u j , u k i, to determine whether they are the real, active, and existing users. The process is performed as follows: suppose that there are two users, saying u i and u k . If there is a third user u j that is located in the middle of the users u i and u k , the triple is constructed. If the production of three edges in such triple does not meet the requirements of the structural balance theory, then we consider that the negative link is unstable and excluded from the negative link candidate set.
To make generalization better, we specify a matrix B. The x h and x l represent links hu i , u k i and hu j , u k i, respectively. If the hu i , u j i is a positive link, both x h and x l are available, then B hl = 1. Otherwise, if both x h and x l are unavailable, then B hl = 0. According to the structural balance theory, if B hl = 1 , then the x h and x l must be the same type of link. Therefore, the balance theory equation is computed as Equation (3) below: where ℓ is the Laplacian matrix on B. Equation (3) will be inferred in the negative link prediction algorithm of Section 4.3.

Negative Link Prediction Algorithm
The link prediction algorithm can be regarded as a classification problem, and the existing links are used as labels to extract features. Unlike the traditional positive link prediction problems, because many online social networks only open positive links to the public, it is necessary to build negative link sample sets firstly, and the accuracy of the sample sets would directly affect the accuracy of negative link prediction. This section firstly describes the construction algorithm of the negative link sample set and then introduces the feature extraction of the negative link. Finally, a negative link prediction algorithm is proposed.

Negative Link Sample Set Construction Algorithm.
In this section, a negative link prediction sample set construction algorithm is proposed based on the methods presented in Section 3 of the sentiment polarity intensity quantification, the user interaction matrix construction, the RW-matrix construction, and the structural balance theory equation.
The basic idea of constructing the negative link sample set is to select negative interaction user pairs as negative link candidate sets from the negative interaction matrix and then use the structural balance theory and RW-matrix to further filter the candidate set to obtain highly reliable negative link samples.
The process of this proposed algorithm is described as follows: (1) Initialization of the negative link sample set NS (2) For each element of a negative user interaction matrix NN ij , if the number of user negative interactions is not 0, that is NN ij ≠ 0, add such user pair to the negative link sample candidate setNS (3) The positive link subgraph g p in the network and the negative link in the candidate set NS form a symbol network g; then, the structural balance theory is used to filter the edges in g (4) For each user pair hu i , u j i in the candidate set NSand any useru k which can form a triple set hu i , u j , u k i, if this triple set cannot satisfy the structural balance theory, hu i , u j i will be removed from the candidate set NS (5) For each user pair hu i , u j i in the candidate set NSand any useru k which can form a triple set hu i , u j , u k i, if this triple set can satisfy the structural balance theory, hu i , u j i will be preserved in the candidate set NS (6) For each user pair hu i , u j i in the candidate set NS, if ðNE ij /MAX fNE ik g k=1,⋯,m Þ < 0:5 which indicates that the negative sentiment intensity of the user u i to u j is less than half of the maximum negative emotional intensity of the user u i , to all other users, hu i , u j i will be removed from the candidate set NS So far, we can obtain highly reliable negative link samples.

Feature Extraction of the Negative Link.
From the negative link samples, we can extract features. The features of the negative links can be divided into the following categories: user features, user-user pair features, and symbol features. We explain such features in detail.
(1) User Features. It is extracted from each user node. The features of the user u i include the following information: the in-degree or out-degree of the positive link, the number of the triples which contain the user u i , the amount of content published by the user u i , the positive or negative opinions to the content published by the user u i , and the opinions that the useru i have to other users.
(2) User-User Pair Features. It is used for extracting features from each pair of users hu i , u j i. The extracted features include the following: the number of positive or negative interactions between u i and u j and between u j and u i , the Jaccard coefficient of the in-degree or out-degree between u i and u j , the shortest path between u i and u j , and the average value in respect to the sentiment intensity of positive or negative interactions between u i and u j .
(3) Symbol Features. The symbol network g is composed of the positive link subgraph g p and the negative link sample set NS, where the weight of the positive link is 1. The weight of the negative link is obtained from the reliability weight matrix. The symbol features for each pair of users include the following: the weighted in-degree or weighted outdegree of the negative links of u i and u j , the Jaccard coefficient of the in-degree or out-degree of the negative links of u i and u j , and the features of 16 weighted triples proposed by Leskovec et al. [18].
These three types of features are represented by F1, F2, and F3, respectively. In order to obtain the influence of different features on the accuracy of prediction, the importance of each feature can be determined by gradually increasing the feature and by observing the change of the accuracy rate after the new features are added to the original feature set. The detailed experiment data is analyzed in Section 5.3.

Negative Link Prediction
Algorithm. Previously, we obtain a highly reliable negative link sample set by selecting users with negative interactions from the user interaction matrix and utilizing structural balance theory and reliability matrix to filter out useless candidate sets. Meanwhile, feature extraction identifies the features required for the classification. Next, we should select the appropriate classifier to carry out the negative link prediction. Since some noises would be introduced to the construction of a negative link sample, the classifier should have the ability to tolerate noise. Here, the soft interval support vector machine is chosen as the classifier, which is proved to have better noise tolerance. Since the soft interval support vector machine (SVM) has the ability to tolerate noise, we introduce the SVM as the classifier.
Let χ = fx 1 , x 2 , ⋯, x N g be a collection of pairs of users in E n ∈ E o , where X i represents the feature vector of the user x i , 5 Wireless Communications and Mobile Computing E n represents the pair of users with the negative link, and E o indicates the pair of users without the link. The SVM in its standard form in the negative link prediction is given as Equation (4) below: where ε i , a slack variable, represents the noise tolerance of the training samples, and C is the penalty parameter (C > 0). Notice that the larger the penalty parameter C, the more the errors obtained in the classification penalty, and vice versa. In the negative link prediction algorithm, the noise levels of the positive and negative link samples are different because the positive link sample is trusted in the network and the negative is not, which is inferred from the prediction. Therefore, the slack variables C p and C n are introduced in the positive and negative links, respectively, to control the noise.
Since the reliability of the negative link is measured by its reliability weight matrix, we introduce the slack variable c j and set the negative link of hu i , u j i as x i . When the c j is used to control the noise of x i , Equation (4) is updated to Equation (5) as follows: The balance theory Equation (3) is introduced into the negative link prediction, and the slack variable C b is given to the balance theory equation to control the error. Then, we can get the updated Equation (6) as follows: Equation (6) is an optimization problem with respect to the inequality constraints. We can transform such optimization problem into a dual form. Since w can be represented as w * = ∑a i Kðx i , xÞ by the inner production of eigenvectors, Equation (6) can be updated to Equation (7) as follows: In Equation (7), K is the Gram matrix of the samples. If we set S i as Equation (8), we have Then, when the Lagrange multipliers β and γ are factored into consideration, the Lagrange function of Equation (7) is expressed as Equation (9) below: Take the derivative of b and ε, respectively, to get Equation (10) as follows: where J = ½I, 0, Y is a diagonal matrix of l × l, and l is composed of positive and negative link samples. When we compute the derivative ofaand substitute Equation (10), the optimization problem is transformed into a where The process of the negative link prediction is illustrated as follows: (1) Choose the SVM as the classifier, with Equation (4) (2) Considering the noise control of positive and negative samples, different error control variables, C p and C n , are assigned to the positive and negative samples, respectively (3) For the negative link sample x i , a variable c j is introduced to further control the error. C j = W ik and W is the RW-matrix, with Equation (5) (4) The structural balance theory is introduced into the negative link prediction, where the slack variable C b is given to the balance theory equation to control the error, with Equation (6) (5) With a series of deduction, an optimization problem is transformed into a Lagrange function following Equations (7)-(10), finally obtaining the negative link prediction model as Equation (11)

Results and Analysis
We present an in-depth discussion of our proposed negative link prediction algorithm. Section 5.1 describes the dataset used in the experiments, Section 5.2 explains the experimental platform, and Section 5.3 analyzes the experimental results.

Experimental Dataset. The Epinions is used as a dataset for experimental evaluation.
It is an open commodity review website, which allows users to evaluate the commodity, make comments on statements from other users, and rate the trust or distrust of users. In addition, it also contains the following relationships: positive-negative relationship, users-content attributions relationship, and users-user rates relationship.
The statistical results of the Epinions dataset are shown in Table 2.
For the data in Table 2, they need to be processed and filtered, and the users who have no positive or negative interactions with others need to be removed. By the way, the data quality indicators [20], such as accuracy, timeliness, completeness, and consistency, are a good choice for evaluating the quality of data for experimental data. Meanwhile, it should be noted that the negative link in Table 2 is only used as an experimental result analysis and for comparison, not for the training and classification of the negative link prediction model.

Experimental Setting.
We mainly evaluate our proposed negative link prediction algorithm with three key parameters involved in the negative link prediction algorithm: C n , c j , and C b . When testing one parameter, the remaining parameters are kept as default values. The detailed information with respect to each parameter is presented in Table 3, where the italicized data is the default value.
In order to evaluate the significance of our proposed algorithm, we choose four groups of baseline algorithms for comparisons, which are described as follows: (1) Random Algorithm. This is the baseline algorithm in the general link prediction. The links in the network are randomly marked as negative, which indicates that the sampling set of the negative link is generated at random.
(2) The Shortest Path Algorithm. The shorter the shortest path between nodes in the network, the more likely there is a link. In addition, it might have a connection for the nodes with a distance of less than threshold 2. The algorithm considers the nodes as candidates when these nodes do not have a link or the shortest distance threshold is not greater than 2. Nodes in the candidate set are marked as negative links.   (4) Balanced Negative Interaction Determination Algorithm. Based on the negative interaction determination algorithm, the balance theory is introduced to filter out useless users. In other words, if the nodes do not meet the requirements of the balance theory, they will be dropped. Otherwise, they are added to the negative link candidate set. Then, nodes in pairs are marked as negative links.

Wireless Communications and Mobile Computing
In order to obtain the influence of different features on the prediction accuracy, w adopts a stepwise increasing feature method. For example, the first feature set (F1) contains only user features. The second feature set (F1+F2) contains user features and user-user pair features. And the third feature set (F1+F2+F3) contains all the features. By adding the new features to the original feature set, we can observe the changes in the classification accuracy to determine the importance of each feature.
We use SVM and Naive Bayes as the classifiers. The classification results on the real Epinions dataset are shown in Table 4.
From Table 4, we can find that SVM can achieve higher accuracy, which indicates that the SVM classifier is more suitable for negative link prediction. Furthermore, on each classifier, the F1 value increases consecutively as the feature increases, which indicates that these three types of features are helpful in the classification. More specifically, the F1 value has the fastest growth when adding the symbol features. It means that the symbol features play a crucial role in the classification results in the negative link prediction. The experimental analysis demonstrates that the RW-matrix proposed in our algorithm is reasonable and feasible.

Experiment of Key Parameters in the Negative Link
Prediction Algorithm. We use C n , c j , and C b as the key parameters to evaluate the algorithm. The experiment is performed by using the control variable method. When one parameter is tested, the other parameters are kept as the default values.
(1) Negative Sample Error Control Parameter Cn. The experimental results are shown in Figure 1. With the change of C n , the accuracy rate and the F1 value are with a process that first 9 Wireless Communications and Mobile Computing rises, gradually stabilizes, and then decreases. The peak value is reached when C n = 0:5. When C n = 1, the positive sample error control parameter should be C p = 1; then, the positive samples and negative samples are the same as error control coefficients. The decreases in the accuracy rate and F1 value indicate that the negative sample should be given a different error control coefficient.
(2) Error Control Parameter cj for the Negative Sample xj. The negative link hu i , u j i is recorded as a negative sample x j , c j = W ij is the error control parameter of x j , and W is the reliability weight matrix (RW-matrix). The experimental results are shown in Figure 2, and c i = f ðxÞ = 1 − ð1/log ð1 + xÞÞ is a better function identified by the previous researchers, which has also achieved higher accuracy in this experiment. Due to a direct relationship between the number of negative interactions and the number of negative links, when c j = 0, the negative link prediction without considering the number of negative interactions is much less accurate. When x j takes other constants, the accuracy rate decreases to a certain degree compared with c i = f ðxÞ = 1 − ð1/log ð1 + xÞÞ. This indicates that the RW-matrix can really reflect the reliability of the negative link.
(3) Regularization Error Control Parameter Cb of the Structural Balance Theory. The experimental results are shown in Figure 3. With the change of C b , the accuracy rate and the F1 value both rise firstly and gradually decrease afterwards. C b = 0 to C b = 0:001 is a significant improvement in accuracy, indicating that the regularization equation of the structural balance theory can improve the performance of negative link prediction. The middle segment remains relatively stable, and the accuracy gradually decreases with the increase of C b . Such results indicate that the weights should be selected appropriately; otherwise, too large weights would reduce the accuracy rate.

Experiment of the RWSBT-NLP Algorithm and Baseline
Algorithm Comparison. The negative link prediction algorithm is compared with four prediction reference methods, illustrated in Section 5.2. The experimental results are shown in Figure 4. The performance of the random algorithm is the worst since the negative link accounts for a small proportion of the overall network. The shortest path algorithm has a larger improvement than the random algorithm, which indicates that the negative link is more likely to exist in the network at a very close distance. The accuracy of the negative interaction determination algorithm has increased dramatically, which indicates that there is a strong link between negative interactions and negative links. The balanced negative interaction determination algorithm improves the accuracy rate compared to the negative interaction determination algorithm, which means that the balance theory does improve the accuracy rate by removing some points that do not meet the balance theory. The accuracy of our proposed negative link prediction algorithm is slightly higher than that of the balanced negative interaction determination algorithm. It indicates that when the negative link prediction algorithms take the sentiment characteristics into account, they can improve the accuracy of the prediction and have a good performance.

Conclusions
This paper focuses on the problem of negative link prediction in symbol networks. We propose a negative link prediction algorithm by using the sentiment analysis and structural balance theory. The sentiment analysis is mainly embodied in the construction of the user interaction matrix based on the calculation of the sentiment intensity of social network texts. Based on the user interaction matrix, we construct the reliability weight matrix (RW-matrix). Then, based on the structural balance theory and constructed RW-matrix, we propose the negative link prediction algorithm by building the negative link sample set and extracting the features. With the experiments and conductions with the real dataset, the influence of each parameter on the accuracy of the prediction results has been analyzed based on the control variable method. Compared with the existing predictive algorithms, the proposed negative link prediction algorithm can improve the accuracy of prediction dramatically with good performances.

Data Availability
The Epinions dataset used to support the findings of this study can be available from http://www.trustlet.org/ epinions.html.