Since public opinion from social media has a growing impact and supervision on trial, risk assessment on public opinion is increasingly important in refined trial management. However, the tremendous amount of public opinion and the insufficient historical logs of trial procedures bring challenges to risk assessment on public opinion. To address this, we propose an adaptive multifactor risk assessment framework on public opinion with fuzzy numbers. Initially, we establish a multilayer indicator model for assessing the risk of public opinion (POR) with multilayer analysis and decision methods. Then, we explore the association rules hidden in the process logs to update the indicator model periodically. Moreover, we design a public opinion analysis module for indicator evaluation, including analysis in public opinion sentiment, hot search, and social media coverage to deal with big data on social media. Especially, the public opinion sentiment is classified by topic-based BiLSTM (T-BiLSTM), which is more accurate. Finally, the fuzzy number similarity is employed to determine POR’s level in the nine-level risk system. Experimental results validate the efficiency of our framework when assessing the POR.
Serious and complicated cases bring severe challenges to trial management nowadays. Some of them have raised much attention due to their case type, related parties, and well-known judges. Simultaneously, people are used to expressing their opinions for the concerned cases on platforms such as Facebook, WeChat, Weibo, and Twitter. A mass of public opinion has both positive and negative impacts on trial procedures. Hence, public opinion assessment and supervision are crucial for credible trials. Actually, public opinion in social media has its characteristics, such as mass amount, fast propagation, and chaotic content. Furthermore, the mass data in social media reveals the inherent information we are concerned about. After analyzing multisource public opinion comprehensively, we could figure out its propagation mode to make POR’s warning come earlier. Therefore, POR assessment is beneficial for early responding to negative public opinions and improving the court’s initiative ability. There are two main tasks while accomplishing the task. One is to handle public opinion with big data theory, and the other is to conduct the risk assessment with insufficient historical data.
For the explosive comments that emerge on social media, sentiment analysis has become a research hotspot. Besides, sentiment analysis on comments about hot cases plays a vital role in promoting trial management. Thus, it is crucial to carry out an efficient analysis and supervision method for comments about cases. So far, research on machine learning-based sentiment analysis has a lot of achievements, such as KNN [
For risk assessment, due to insufficient historical data, together with the fuzziness and uncertainty of risks, researchers adopt a fuzzy set theory to analyze the risk [
However, there still exists some challenges to achieve the assessment of POR. Firstly, there is no suitable indicator model for this task. An efficient assessment relies on fine-grained indicators and objective weights for indicators, and it remains unsolved. Secondly, comments about cases in the trial on social media have many characteristics that are hard to analyze. Hence, it remains much work to ensure the accuracy of sentiment classification for the specific use. Thirdly, how to evaluate risks quantitatively is not easy but crucial.
To address these issues, this paper implements a Risk Assessment framework on Public Opinion for Trial management (RAPOT). The framework provides a fine-grained risk assessment based on fuzzy numbers. By computing fuzzy number similarities, the framework decides its risk level in the nine-level assessment system. Our main contributions in this paper are as follows:
Fine-Grained Risk Rating System. We employ fuzzy number similarities to achieve risk assessment with little historical data in trial procedure management. At first, a multilayer risk indicator model is established based on the analytic hierarchy process (AHP) method and extended technique for order preference by similarity to an ideal solution (extended TOPSIS) method. The model contains a fine-grained indicator layer, and each one contains a risk indicator and its impact factor. When assessing the risks, we transform both impact factors and indicator values into fuzzy numbers. Then, we aggregate the fuzzy numbers into one and rank the integrated one in the nine-level assessment system Adaptive Indicator Model. Considering that the system logs accumulated during trial processing contain many latent association rules of the procedures, we propose the RApriori algorithm to explore the association rules. These latent rules are updated to the indicator model for improving the applicability and robustness of the model Efficient Comment Sentiment Analysis. We define three kinds of input sources and submodules for indicator evaluation. Significantly, the sentiment of public opinion is classified based on topics. The sentiment analysis that we propose consists of single-pass-based topic clustering and T-BiLSTM-based sentiment analysis. Sentiment analysis for topics is precise and more comprehensive. Besides, our framework has extensive indicators such as the topic’s heat and coverage of media Experimental Evaluations. To demonstrate the performance of RAPOT, we conduct a case study with three cases that are paid much attention recently. The results illustrate that our framework is applicable and efficient in practical cases with a reasonable assessment level
The rest of this paper is structured as follows. We talk about the related work in Section
Due to the fuzziness and uncertainty of risks, researchers adopt a fuzzy set theory to analyze the risk. The theory of fuzzy numbers has been widely applied in risk analysis [
The existing fuzzy number similarity-based methods always have three main modules. They are the risk indicator model, risk aggregation, and risk level determination. Among them, fuzzy number similarity calculation is important for risk level determination precisely. Referring to fuzzy number similarities, researchers have defined various features of generalized fuzzy number (GFN) to distinguish the numbers, such as the center of gravity (COG) [
In this section, we discuss the critical issues while assessing the POR. Firstly, we present the risk indicator model in Sections
Figure
The framework of RAPOT.
To overcome the difficulty of lacking historical data, we employ AHP and extended TOPSIS to construct an initialized risk indicator model. The hierarchy model defines amounts of risk indicators along with their impact factors. Figure
The flowchart of risk indicator model initialization.
AHP is an efficient multilayer analysis and decision method [
1-9 scales of relative importance [
Intensity of importance | Definition |
---|---|
1 | Equal importance |
2 | Weak |
3 | Moderate importance |
4 | Moderate plus |
5 | Strong importance |
6 | Strong plus |
7 | Very strong or demonstrated importance |
8 | Very, very strong |
9 | Extreme importance |
At first, we refer to expertise, existing laws, regulations, and the classical hot cases and form the set of risks as Objective Layer (OL). Risk assessment of public opinion for trial management is the objective of our work. We need to figure out the impacts of public opinion on the trial procedure Criteria Layer (CL). The elements in this layer are the judge, the parties involved, the case, and the public opinion. The expert group defines the elements referring to the existing documents Indicator Layer (IL). This layer contains the indicators which would impact the trial procedure by public opinion. Each indicator belongs to their father elements in the criteria layer
The indicator model for the POR.
After that, an evaluation dataset is collected to gain the indicators’ impact factors, and the impact factor represents the indicator’s weight when integrating the POR. To evaluate the impact factor accurately, the expert compares the risk indicators with pairs to complete a comparison matrix as
When
The eigenvector of the approved evaluation matrix gives a sort of risk indicators by their impact factors. For risk assessment with fuzzy numbers, the expert assigns a linguistic term in LT ={“AbsolutelyLow (AL)”, “VeryLow (VL)”, “Low (L)”, “FairlyLow (FL)”, “Medium (M)”, “FairlyHigh (FH)”, “High (H)”, “VeryHigh (VH)”, “AbsoluteHigh (AH)”} to each risk indicator based on the order.
Serval law experts evaluate the impact factors according to our hierarchical structure and construct an evaluation dataset. The dataset contains several evaluation items
First, an evaluation matrix with linguistic terms is established based on the dataset as
The transform from linguistic terms to the fuzzy numbers [
Lingustic terms | Generalized fuzzy numbers |
---|---|
AbsolutelyLow | (0.0, 0.0, 0.0, 0.0; 1.0) |
VeryLow | (0.0, 0.0, 0.02, 0.07; 1.0) |
Low | (0.04, 0.1, 0.18, 0.23; 1.0) |
FairlyLow | (0.17, 0.22, 0.36, 0.42; 1.0) |
Medium | (0.32, 0.41, 0.58, 0.65; 1.0) |
FairlyHigh | (0.58, 0.63, 0.80, 0.86; 1.0) |
High | (0.72, 0.78, 0.92, 0.97; 1.0) |
VeryHigh | (0.93, 0.98, 1.0, 1.0; 1.0) |
AbsolutelyHigh | (1.0, 1.0, 1.0, 1.0; 1.0) |
In the extended TOPSIS, the positive and negative ideal solutions are
Then, the distance between
Similarly, the geometric distance between
Finally, the impact factor of indicator
Considering that the trial process is strict and complicated, POR’s initial indicator model can be hardly applicable to the POR assessment continuously. Also, the system logs accumulated during trial processing contain many latent association rules of the procedures. Figure
Require: system logs generate from T1 to T2 Ensure: association rules 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19:
A fragment of the trial process.
In the algorithm, we assign numerical codes to both process nodes and risk confirm nodes based on their sequence in trial. Firstly, the search of latent association rules always starts from a frequent risk confirm node
The procedures of joinSet operation.
The RApriori method is executed regularly, and the searched association rules are added to update the indicator model of POR. The experimental results show that our algorithm decreases the computational complexity significantly.
Besides the indicator factor, we have to calculate the probability of indicator occurrence, which we call the indicator value. The data sources of value computing can be divided into three categories: (1) social media, (2) manual input, and (3) document analysis. For indicator C3.1, the judge can report the POR during the trial. As for C1.2, C1.4, C2.1, and C3.3, the indicator values are determined by the other subsystems in the TPMS, for instance, the case division system. Apart from them, the values of indicators C1.1, C1.3, C2.2, and C3.2 are inferred from the social media analysis module. Figure Analysis in Public Opinion Sentiment. This part explores how people are interested in the case and how intensely they discuss the related topics. If the public cares much about the case and shows negative sentiment in their expressions, the indicator value will be large. On the contrary, the indicator value will come near zero Analysis in Hot Search. The judge or the parties frequently searched in social media is an important indicator that this case may have the POT during the trial Analysis in Media Coverage. If the media in our maintained important-media list has taken part in the related topic, this case’s media coverage will increase. The POT level increases with the coverage reaching a threshold
The structure of the social media analysis module.
In this section, we mainly describe the public opinion sentiment based on topics. The comments collected from social media related to the case are divided into some topics to address this. Then, the texts and the related topics are fed into a neural network to train a classifier used to analyze the sentiment. The details are as follows.
Firstly, a short text is split into a word sequence
Single-pass clustering [
Since BiLSTM [
The structure of T-BiLSTM.
The public opinion sentiment for topics is defined as
Here,
In this section, we describe the fuzzy number similarity-based risk assessment module which evaluates the risk level in the nine-level risk system. At first, the risk indicator evaluations we talk about in Section
Here,
As Figure
The comparison in attenuation of similarities with increasing distance for the five methods.
In this section, we discuss the results of the three experiments: (A) efficiency of algorithm RApriori, (B) efficiency of the classifier T-BiLSTM, and (C) the case study of the whole framework RAPOT.
To validate the efficiency of RApriori, we compare it with the classical Apriori and FP-Growth. There are three subexperiments in this section: (a) time costs with different rule lengths, (b) time costs with different rule counts, and (c) time costs with different datasets. We carry on these experiments on the simulation datasets generated with the parameters shown in Table
The parameters of simulation datasets.
Parameters | Value |
---|---|
Count of process nodes | 80 |
Count of confirm nodes | 80 |
Error rate | 0.15 |
Confirm rate | 0.15 |
We train the classifier for public opinion sentiment analysis with the dataset contains 18000 positive comments and 18000 negative comments come from Weibo. The validating set has 3600 positive items and 3600 negative items. In addition, we compare the T-BiLSTM-based sentiment classifier with the KNN, maximum entropy, Bayes, SVM, and traditional BiLSTM. We adopt accuracy, positive-precision, positive-recall, and Macro-F1 as the evaluation metrics that are defined as
The comparison in accuracy of the five classifiers.
Classifier | Acc | Precision | Recall | Macro-F1 |
---|---|---|---|---|
T-BiLSTM | 0.88 | 0.90 | 0.85 | 0.88 |
BiLSTM | 0.87 | 0.88 | 0.86 | 0.87 |
ME | 0.81 | 0.82 | 0.79 | 0.81 |
Bayes | 0.84 | 0.81 | 0.89 | 0.84 |
KNN | 0.73 | 0.67 | 0.90 | 0.72 |
SVM | 0.80 | 0.79 | 0.82 | 0.80 |
In this section, we evaluate the efficiency and applicability of RAPOT with a case study. It includes three sets of short texts corresponding to three cases; the size of the three sets are 764, 306, and 156. At first, the risk indicator model of RAPOT is shown as Figure
Indicator evaluation for the three cases.
Indicators | Impact factors | Case 1 | Case 2 | Case 3 |
---|---|---|---|---|
C1.1 | VL | VL | FH | FH |
C1.2 | H | AL | AL | AL |
C1.3 | H | M | L | VL |
C1.4 | M | AL | AH | AH |
C2.1 | H | AH | AH | AL |
C2.2 | M | AH | AH | AL |
C3.1 | FH | AL | AH | AL |
C3.2 | M | AL | AL | AL |
C3.3 | L | AL | AL | AL |
Table
The results of fuzzy similarities.
Risk level | Case 1 | Case 2 | Case 3 |
---|---|---|---|
Absolutely low | 0.4970 | 0.3506 | 0.5813 |
Very low | 0.5229 | 0.3412 | 0.6386 |
Low | 0.6917 | 0.4318 | 0.8592 |
Fairly low | 0.9146 | 0.5872 | 0.7900 |
Medium | 0.6866 | 0.7997 | 0.5561 |
Fairly high | 0.4700 | 0.7347 | 0.3880 |
High | 0.3895 | 0.5955 | 0.3246 |
Very high | 0.3185 | 0.4770 | 0.2712 |
Absolutely high | 0.3108 | 0.4881 | 0.2616 |
The comparison in results of five similarity measure methods.
Time costs with different rule lengths.
Time costs with different rule counts.
Time costs with different dataset.
The accurate and fine-grained risk assessment on public opinion in the trial procedure is crucial for refined trial management. Our framework proposed in this paper provides an objective and efficient assessment for POR in the trial without using a large amount of historical data, which is quite lacking, and we propose T-BiLSTM to analyze public sentiment opinion based on topics. The method is more comprehensive than traditional BiLSTM in practice. The risk assessment framework for POR consists of three modules: (1) an adaptive multifactor indicator model for POR assessment, (2) the indicator evaluation module with an accurate public opinion analysis, and (3) the objective risk ranking module. The experimental results show the efficiency and practicability of our framework. In the future, we will work hard on the considerable amount of processing logs in the TPMS to further improve our indicator model’s adaptation and robustness.
The dataset used to support the findings of this study is available from the corresponding author upon request.
The authors declare that they have no conflicts of interest.
The authors gratefully acknowledge the support of the National Key R&D Program of China under grant No. 2018YFC0830500.