RnRTD: Intelligent Approach Based on the Relationship-Driven Neural Network and Restricted Tensor Decomposition for Multiple Accusation Judgment in Legal Cases

The use of intelligent judgment technology to assist in judgment is an inevitable trend in the development of judgment in contemporary social legal cases. Using big data and artificial intelligence technology to accurately determine multiple accusations involved in legal cases is an urgent problem to be solved in legal judgment. The key to solving these problems lies in two points, namely, (1) characterization of legal cases and (2) classification and prediction of legal case data. Traditional methods of entity characterization rely on feature extraction, which is often based on vocabulary and syntax information. Thus, traditional entity characterization often requires extensive energy and has poor generality, thus introducing a large amount of computation and limitation to subsequent classification algorithms. This study proposes an intelligent judgment approach called RnRTD, which is based on the relationship-driven recurrent neural network (rdRNN) and restricted tensor decomposition (RTD). We represent legal cases as tensors and propose an innovative RTD method. RTD has low dependence on vocabulary and syntax and extracts the feature structure that is most favorable for improving the accuracy of the subsequent classification algorithm. RTD maps the tensors, which represent legal cases, into a specific feature space and transforms the original tensor into a core tensor and its corresponding factor matrices. This study uses rdRNN to continuously update and optimize the constraints in RTD so that rdRNN can have the best legal case classification effect in the target feature space generated by RTD. Simultaneously, rdRNN sets up a new gate and a similar case list to represent the interaction between legal cases. In comparison with traditional feature extraction methods, our proposed RTD method is less expensive and more universal in the characterization of legal cases. Moreover, rdRNN with an RTD layer has a better effect than the recurrent neural network (RNN) only on the classification and prediction of multiple accusations in legal cases. Experiments show that compared with previous approaches, our method achieves higher accuracy in the classification and prediction of multiple accusations in legal cases, and our algorithm is more interpretable.


Introduction
In contemporary society, the demand for big data assistance in the judgment of legal cases, such as case intelligence research [1] and judgment [2], big data comprehensive supervision, and assistance in handling legal cases, is increasing with the development of big data and artificial intelligence technology. Researchers are committed to creating an "intelligent legal case judgment" project that combines big data and artificial intelligence. Legal case multiaccusation judgment business is an important part of the realization of such a project. Legal case multiaccusation judgment technology fully applies big data and artificial intelligence technology to service judgment making, legal case handling [3], and facilitation of the public. Big data provides judgments with recognized standards for judging legal cases and avoids the occurrence of different judgment results in similar legal cases. Artificial intelligence technology avoids the subjectivity of human beings, performs scientific and accurate analyses of cases from the perspective of cases and laws, and helps judges make objective judgment in legal cases. e solution to using big data and artificial intelligence technology to accurately judge multiple accusations in different legal cases involves two main points, namely, (1) construction of a comprehensive and accurate characterization method of legal cases and (2) realization of a classification and prediction algorithm for multiple accusations involved in a large number of legal cases. Figure 1 shows the process of multiaccusation classification for legal cases. Traditional methods of entity characterization are often used to model an entity by tagging it. However, these feature extraction methods are highly dependent on the vocabulary and syntax in the entity data set and require heavy manpower and material resources. e generality of the tagged model is poor. In addition, feature extraction methods based on vocabulary and syntax require strong expert knowledge as support. e resulting entity characterization considerably limits the subsequent classification algorithm, and the algorithm's accuracy becomes highly volatile.
is study proposes an intelligent legal case judgment technique called RnRTD, which is based on the relationshipdriven recurrent neural network (rdRNN) and restricted tensor decomposition (RTD). Figure 2 shows the framework of our approach. We present legal case data as tensor χ and propose an RTD technique. RTD is less dependent on vocabulary and syntax than traditional feature extraction methods, and it focuses more on extracting the information of potential structures in legal case tensors. RTD maximizes the accuracy of rdRNN by combining text and structural information. RTD maps legal case tensor χ into specified feature space Z, which decomposes the original tensor χ into core tensor χ and its corresponding feature matrix set C k under the restricted condition η in RTD. e obtained core tensor χ represents the tensor structure information that is most helpful in improving the accuracy of the rdRNN classification algorithm. Core tensor χ can be interpreted as the most advantageous feature structure in χ for rdRNN. RTD is an important feature extraction and dimensionality reduction operation. is study uses rdRNN to update and optimize restricted condition η in RTD iteratively so that its feature space Z continually approaches an ideal region, thus enabling rdRNN to achieve an optimal effect in the classification of multiple accusations in legal cases.
Compared with traditional feature extraction methods, RTD obtains legal case characterization containing tensor element and structural information that is more conducive for improving the accuracy of the rdRNN classification algorithm, and it has lower dependence on vocabulary, syntax, and expert knowledge. at is, the RTD legal case characterization model has better universality and fewer requirements on the dataset format in comparison with traditional feature extraction methods. Compared with the direct use of the original legal case tensor χ as the input of the RNN classification algorithm, rdRNN with an RTD layer has a better effect on the classification of multiple accusations in legal cases. e main reason is that rdRNN constantly updates and optimizes RTD restricted condition η, thereby enabling RTD to point to feature space Z where rdRNN has the best effect in legal case classification.
e main contributions of this study are summarized in the following points: (i) is study uses a new method of characterizing legal cases. is study expresses a legal case as a tensor and proposes an RTD method that maps the original legal case tensor into a new feature space. RTD extracts the favorable tensor structure and text information for the subsequent classification algorithm from the original legal case tensor. RTD also extracts valuable tensor features and reduces tensor dimensions. e core tensor obtained by RTD is interpreted as the most valuable tensor structure and textual feature information extracted from the original legal case tensor for the rdRNN classification algorithm.
(ii) is study proposes rdRNN, which is a new approach for intelligent judgment of multiple accusations in legal cases. We add a new gate and a similar case list to control the interaction between tensors of legal cases on the basis of the original neural networks. rdRNN is particularly used for the intelligent judgment of multiple accusations in legal cases. It fully considers the impact of the relationship between legal cases on the judgment results of such cases. For example, highly similar legal cases are likely to have similar judgment results and vice versa. (iii) is study proposes a neural network-based method for the optimization of the restricted tensor. e restricted tensor is a bridge between the RTD algorithm and rdRNN. rdRNN controls the tensor decomposition process by optimizing the restricted tensor, which guides the core tensor along the direction that is most conducive for improving the accuracy of the classification model. We derive the partial derivative of the loss function in rdRNN for the restricted tensor and realize the optimization operation of the neural network for the restricted tensor.
Section 2 gives the recent research progress on the classification of multiple crimes in legal cases. Section 3 introduces related definitions and the concepts involved in this study. Section 4 introduces the proposed approach for the judgment of multiple accusations in legal cases. Section 5 provides the experimental results and analysis of this study, and Section 6 presents a detailed discussion of the proposed method.

Related Work
With the advent of the era of big data and the development of artificial intelligence technology [4], the emergence of deep neural networks provides great prospects for accurate classification and prediction [5]. Neural network-based knowledge representation and reasoning methods enable deep learning approaches to be applied to many scenarios [6]. For the legal field, the combination of artificial intelligence and law has become an inevitable trend [7]. However, current research in this area mainly focuses on legal case modeling [8], legal case document retrieval [9], legal consultation question-and-answer systems [10], and legal case similarity reasoning work [2]. Little research has been conducted on the multiaccusation determination of cases in the legal field.
Bartolini et al. proposed a semantic annotation method for indexing and retrieving legal texts [11]. e method uses a specific segment extraction and text classification algorithm to automatically semantically mark legal documents. Aleven developed a computational model based on artificial intelligence algorithms and professional legal knowledge [2]. e model determines the correlation between cases based on the context and problem scenarios of the case. Joshi et al. proposed a text mining method for electronic evidence review of legal cases [12]. e method uses semantic topic and text classification technology to repeatedly detect the feature vocabulary in legal documents and then automatically segments and screens the documents, avoiding the manual work of legal analysts.
Sulea et al. proposed a legal case judgment system based on SVM classifier [13]. e method uses machine learning techniques to predict the legal field to which the legal case belongs and the outcome of its judgment. By accurately extracting the features of legal cases, the method can roughly predict the specific date of the case. Brninghaus and Ashley proposed a text classification method based on facts of legal cases [14]. e method uses artificial intelligence algorithms and legal background knowledge to predict the outcome of legal cases. e method extracts facts of legal cases, indexes and models them according to the features, and finally completes the classification of legal cases. e critical part for the prediction of legal case judgments is case modeling and case classification. Traditional text modeling methods are based on feature tags, which rely heavily on the syntax and semantic information of the source data. Labeling features requires a lot of manual work and expert knowledge. erefore, the text classification algorithm formed on this basis is not scalable, and the accuracy is highly volatile.

Preliminaries
is section introduces the related methods, definitions, and background knowledge involved in this study. Section 3.1 presents the basic notations and definitions. Section 3.2 provides a formal representation of the tensor decomposition problem. Section 3.3 introduces the calculation process of forward propagation in bidirectional long shortterm memory (Bi-LSTM). Section 3.4 presents a formal description of the problem about intelligent legal judgment to be solved in this study.

Definitions and Notations.
is section describes the relevant notations and definitions required in this work. Tensors are actually multidimensional matrices [15], which we represent in Euler script letters, such as χ and ]. We refer to the dimensions as tensor modes and to the number of a tensors modes as order. We describe the scalars in lowercase letters (such as a, b) and the vectors in boldface lowercase letters (such as c, d). We declare the matrices in capital letters, such as A and B. We use A T to represent the transpose of matrix A. We express the identity matrix as I, the identity tensor as τ, and the matrix with all elements of 1 as 1. Table 1 shows all the required notations and definitions.

Definition 1 (outer product).
e outer product of vectors a ∈ R I and b ∈ R J is denoted as

Definition 2 (elementwise multiplication).
e elementwise multiplication of vectors a ∈ R I and b ∈ R I is denoted as A � a * b, where A ∈ R I and A(i) � a(i)b(i). In another case, the elementwise multiplication of vector a ∈ R I and matrix A ∈ R I×J is denoted as Z � a * A, where Z ∈ R I×J and Z(i, j) � a(i)A(i, j). (2) Definition 5 (n-mode matricization). Given an N-mode tensor χ, χ ∈ R I 1 ×I 2 ×···×I N . χ can be matrixed into N forms according to each mode. We denote the n-mode matricization of χ as χ (n) , where χ (n) ∈ R I n ×I 1 ···I n−1 I n+1 ···I N . χ (n) is obtained by keeping the nth mode unchanged while expanding and concatenating the slices of the remaining modes into a matrix.
Definition 6 (Frobenius norm of a tensor). Given an N-mode tensor χ, χ ∈ R I 1 ×I 2 ×···×I N , the Frobenius norm of χ is denoted as Definition 7 (n-mode stretch). Given an N-mode tensor ], ] ∈ R J 1 ×J 2 ×···×J N , and a weight matrix W, W ∈ R J n × N k≠n J k . e n-mode stretch between ] and W is expressed as Computational Intelligence and Neuroscience Definition 8 (n-mode product). Given an N-mode tensor χ ∈ R I 1 ×I 2 ×···×I N and a matrix C ∈ R I n ×J , their n-mode product is denoted as λ � χ× n C, λ ∈ R I 1 ×···×I n−1 ×J×I n+1 ×···×I N .     [15]. As shown in Figure 3, tensor decomposition methods decompose the original tensor into a core tensor and a series of corresponding factor matrices. e essence of tensor decomposition is to approximate the original tensor by using the product of the core tensor and the factor matrices. e mathematical description of tensor decomposition is as follows: Given an N-mode tensor χ, χ ∈ R I 1 ×I 2 ×···×I N . e following formula can be obtained by using the tensor decomposition method: where τ is the core tensor, τ ∈ R J 1 ×J 2 ×···×J N , and C n is the corresponding factor matrix set, C n ∈ R J n ×I n . Each element in C n is a column orthogonal matrix. τ and C n also minimize function φ, where 3.3. Bi-LSTM. RNNs have far-reaching implications for the study of sequence data [16]. e nodes between the hidden layers of RNN are connected [17], that is, the input of the hidden layer contains not only the output of the input layer but also the output of the hidden layer at the last moment. In theory, RNN can process sequence data of any length. However, gradient disappearance or gradient explosion often occurs when RNN deals with long-distance dependence, thereby making RNN training difficult. e hidden layers of the original RNN has only one kind of state, which is very sensitive to short-term inputs. Long short-term memory (LSTM) deals with long-distance dependence by increasing the long-term memory state in the original RNN [18].
As shown in Figure 4, we represent the input value of LSTM at time t as x t , the output value from the previous moment t − 1 as h t−1 , and the long-term unit state at time t − 1 as c t−1 . We record the unit status entered at time t as c t . e output value of LSTM at time t comprises two parts, namely, the output value of LSTM at current time h t and the unit state of current time c t . LSTM sets up three control gates, which are forget, input, and output, to control the long-term unit state c. e forget gate is used to determine how much of the long-term unit state at the previous moment is retained at the current moment. For example, the forget gate f t at time t determines the weight of c t−1 in the calculation of c t . e input gate is used to determine how much of the input of LSTM is retained in the current longterm unit state. For example, input gate i t determines the weight that x t takes while calculating c t . e output gate is used to determine how much the long-term unit state at the current moment contributes to the output of LSTM at the current time. For example, output gate o t determines the influence of the value of c t on h t . e process of forward propagation calculation in LSTM is described as follows: e long-term unit state c t at current time t is calculated by f t , i t , c t−1 and c t , and the final output of LSTM h t is calculated by o t and c t . at is, where h t−1 is the output of LSTM at time t − 1, x t is the input of LSTM at time t, σ is the sigmoid function, which is our selected activation function in LSTM, c t is the unit state input at time t, w f , w i , and w o are the weight matrices of the Original tensor Figure 3: Tensor decomposition. Computational Intelligence and Neuroscience forget gate f t , the input gate i t , and the output gate o t , respectively, and b f , b i , and b o are the bias matrices of f t , i t , and o t , respectively. e activation function used in calculating c t is the hyperbolic tangent function, where w c is the weight matrix and b c is the bias term.
Bi-LSTM is a bidirectional RNN [19]. e unit state of the hidden layer in Bi-LSTM is calculated from the outputs of forward and backward LSTM. We define the output unit state of Bi-LSTM at time t as h Bi−LSTM t , the output unit state of forward LSTM as h f t , and the output unit state of backward LSTM as h b t . e aforementioned forward propagation formula of LSTM implies that

Problem Description
Problem 1. We express the legal case as a tensor and classify the legal case according to the judgment result. e category of each legal case is indicated by a scalar, such as r. Given a legal case dataset Ω that contains legal cases with judgment results, Ω � (χ (1) , r (1) ), (χ (2) , r (2) ), · · · , (χ (N) , r (N) ) . χ (n) represents the nth legal case in the legal case dataset Ω. r (n) indicates the type of legal judgment result that corresponds to the nth legal case. Our goal is to train a case classification model ϕ(χ) that can classify legal cases based on their judgment results.
In this study, legal cases are represented as three-dimensional tensors. As shown in Figure 5, the first dimension represents the basic components of the case, such as the defendant's statement, the plaintiff's statement, the public prosecution, and the court's trial. On this basis, the matrix slice that contains the last two dimensions represents the matrix form of the corresponding legal case component. e matrix slice is composed of the accumulation of word vectors inside the legal case component. Generally, case components are matrixed instead of including the word vectors of all the words in the matrix. We selectively extract words that are valuable for the legal case classification. ese words can be divided into two categories. e first category usually includes nouns or pronouns, such as characters, times, places, and objects; the second category usually comprises adjectives, numerals, or verbs, such as the means of committing accusations, the degree of harm, and the number of accusations.

Our Approach
is study proposes RnRTD for the multiaccusation determination of legal cases. Figure 6 shows the RnRTD framework. First, we extract core tensors from the original tensors using the RTD method. e core tensor approximates the restricted tensor in terms of the tensor structure and elements. Second, we use rdRNN to optimize the restricted tensor so that it guides the core tensor along the direction that is most conducive for improving the accuracy of the classification model.

RTD Method.
is study proposes a new tensor decomposition method called RTD method. e inputs of the RTD algorithm include the restricted condition tensor η and tensor χ that represents the legal case. e RTD outputs include core tensor χ and its corresponding factor matrix sets, namely, C k and D k . RTD decomposes χ into a core tensor χ under the action of the restricted condition η. χ is approximated to η in terms of tensor structure and internal element values. RTD can be interpreted as a mapping of the original tensor χ to the core tensor χ. In short, RTD achieves directional decomposition of tensors and extracts vital information from tensors while reducing their dimension. In this study, we define core tensor χ as the most favorable tensor structure and element value information for the subsequent legal case classification algorithm, namely, rdRNN. On this basis, we construct a deep neural network model for RnRTD that is dedicated to legal intelligence judgments.
RTD decomposes the original tensor under the restricted condition so that the obtained core tensor constantly approaches the restricted tensor in terms of tensor structure and element value. In Figure 7, the formal description of the problems to be solved by the RTD algorithm is shown as Problems 1 and 2.
Problem 2. Given tensor χ ∈ R I 1 ×I 2 ×···×I K , restricted tensor η ∈ R J 1 ×J 2 ×···×J K , and its weight w η , we derive two factor matrix sets, namely, C k and D k , C k ∈ R I k ×H k , D k ∈ R J k ×H k , that C k and D k minimize the following function: Matrix W η is preset according to the legal case, W η ∈ R H n × K k≠n H k . e elements in sets C k and D k are orthogonal matrices, that is, they meet the following conditions. For any elements C k and D k in sets C k and D k , In this study, we use the alternating least squares (ALS) algorithm to determine the solution of the objective function ϕ.
e ALS algorithm can be divided into four steps: (1) randomly pick a variable as a parameter and randomly generate the values of other variables, (2) determine the partial derivative of the loss function ϕ in the specified parameter while fixing the values of other variables, (3) set the partial derivative of ϕ to the specified parameter as zero and calculate the value of the specified parameter, and (4) select another variable as a parameter and return to Step (2). e ALS algorithm continues to iterate Steps (2), (3), and (4) until the error of the loss function ϕ reaches the tolerable upper limit.
Problem 2 needs to be solved using Lemma 1. e specific definition and proof of Lemma 1 are provided as follows.

Lemma 1. Given function tr(α T α) �
and C k satisfy equation (12). For any element , where ε is a constant. e proof of Lemma 1 is shown in Proof 1.
We use ] to represent μ T μ, and we can get that φ � tr(C T k 0 ]C k 0 ). According to the function derivation rule, we can obtain the following equation: where ε is a constant, ε � 2.
According to the iterative process of the ALS algorithm, the precondition for solving the value of C k and D k which minimize the function ϕ in equation (11) using the ALS algorithm is to calculate the value of zϕ/zC k 0 , where k 0 ∈ [1, K]. Equation (11) shows that zϕ/zC k 0 and zϕ/zD k 0 are solved in the same manner. Proof 2 provides mathematical proof of the calculation of zϕ/zC k 0 .  Computational Intelligence and Neuroscience Proof 2. We use λ and ϖ to represent χ K k≠k 0 × k C k and w η η K k × k D k , respectively. According to formula (11), we can obtain that ϕ � tr((λ× k 0 C k 0 − ϖ) T (λ× k 0 C k 0 − ϖ)). We abbreviate the aforementioned formula as ϕ � tr((λC k 0 − ϖ) T (λC k 0 − ϖ)).
en, we can determine the following formula: zϕ/zC k 0 � (ztr(ϖ T ϖ) − 2ztr(C T k 0 λ T ϖ) + ztr (C T k 0 λ T λC k 0 ))/zC k 0 . According to the function derivation rule, we derive that ztr(ϖ T ϖ)/zC k 0 � 0, (ztr(C T k 0 λ T ϖ)/ zC k 0 ) � λ T ϖ. In combination with Lemma 1, we can obtain that (ztr(C T k 0 λ T λC k 0 )/zC k 0 ) � 2λ T λC k 0 . Finally, the following formula is determined: We set the value of equation (14) to 0 and obtain that . By combining Equation (12), we derive that We use the SVD matrix decomposition method to decompose Z, and find that Z � PSQ T . P and Q are orthogonal matrices, P is a left singular matrix, Q is a right singular matrix, and S is a diagonal matrix. After this analysis, the following solution can be obtained: In summary, according to equations (11)- (17), we can derive a solution of zϕ/zC k 0 and C k 0 which are described in equations (14) and (17), respectively. zϕ/zD k is calculated in the same manner as zϕ/zC k . On this basis, we calculate the value of C k and D k , which minimize the objective function ϕ in formula (11), by using the ALS algorithm.
Algorithm 1 shows the solution of Problem 2 by using ALS algorithm. e inputs of Algorithm 1 are χ which represents one legal case, the restricted tensor η, and its weight w η . In line 2, we randomly initialize the values of C k , U k . max_iterations in line 2 represents the maximum number of iterations of ALS algorithm. e function calcu_Z in line 4 corresponds to equation (16). Line 5 and 6 show the calculation process of equation (17).
Another problem to be solved by the RTD algorithm is Problem 3, which is the formal description of the process of tensor decomposition on the original tensor under the action of the restricted tensor and its weight. On the basis of Problem 2, we can obtain factor matrix sets C k and D k , which minimize the value of function ϕ in formula (11) while satisfying formula (12). □ Problem 3. Given a tensor χ ∈ R I 1 ×I 2 ×···×I K and factor matrix sets C k and D k , C k ∈ R I k ×H k , D k ∈ R J k ×H k , C k and D k are derived from Problem 2. A core tensor χ is determined, where χ ∈ R J 1 ×J 2 ×···×J K and χ minimize the following target function: e specific definition and proof process of Lemma 2 are as follows. (12). en, the partial derivative of the target function ψ to
After the aforementioned analysis, Proof 4 gives the solution to Problem 3 and its mathematical proof process while combining Lemma 2 and Proof 3. □ Proof 4. We use υ to represent χ K k × k C k and c to represent τ K k × k D k , where τ is the identity tensor, τ ∈ R J 1 ×J 2 ×···J K . en, the function F RTD can be rewritten as F RTD � ‖υ − χc‖ 2 F , that is F RTD � tr((υ − χc) T (υ − χc)). Known by the definition of function tr, tr(c T χ T υ) � tr(χ T υc T ).
Algorithm 2 implements RTD by using Algorithm 1. Function TSPA in line 1 represents the implementation of Algorithm 1, and the inputs are χ, η, and w η . Function F_RTD in line 2 shows the calculation of χ using equations (18)- (21). Finally, the core tensor of χ is obtained by using Algorithm 2, which approximates the restricted tensor η on the layer of tensor structure and elements information. e new gate uses the similarity matrix between samples as a parameter of the deep neural network training model. Compared with the original bidirectional RNN, the classification result of rdRNN is more accurate and stable. For the intelligent judgment of legal cases, the original deep neural network method does not consider the correlation between legal cases. is disregard may lead to bias in the final case classification. For example, the verdict of a legal case is inconsistent with the description Input: Tensor χ which represents the original legal case data, χ ∈ R I 1 ×I 2 ×···×I K , the restricted tensor η, η ∈ R J 1 ×J 2 ×···×J K , and its weight w η . Output: e factor matrix sets C k and D k , C k ∈ R I k ×H k , D k ∈ R J k ×H k . Initialize the factor matrix sets C k and D k ; for i � 1 to max iterations do // First, pick the elements in the factor set C k as variables for k 0 � 1 to K do Z � calcu Z(χ, η, C k , D k , w η , k 0 ); P, S, Q T � SVD(Z); C k 0 � PQ T ; end // en, pick the elements in the factor set D k as variables ALGORITHM 1: e solution of Problem 2 by using ALS Algorithm.
Computational Intelligence and Neuroscience 9 of the case. To solve this problem, rdRNN fully considers the judgment results of legal cases that are similar to the case to be judged. rdRNN uses these results as a parameter of the deep neural network training model and realizes an efficient and accurate classification of multiple accusations in legal cases. e following section shows the training of rdRNN's deep neural network while using Rmsprop as its optimization function: (i) We use the dataset χ (n) , L (n) and the restricted tensor η as inputs of rdRNN. χ (n) is the core tensor of χ (n) , which represents the legal case. χ (n) is obtained by the RTD algorithm with χ (n) and η as its inputs. L (n) represents the category label of χ (n) according to the judgment result of legal case. (ii) In this study, we combine rdRNN with the softmax layer to complete the classification of legal cases. For sample χ (n) , assuming h (n) is the output vector of rdRNN, the softmax layer implements the mapping of h (n) to the legal case category L (n) . (iii) We use cross entropy croen_F as the loss function to update rdRNN. rdRNN uses its forward propagation algorithm and error backpropagation formulas to iterate over the values of parameters in neural networks, such as weight matrices w d and bias terms b d that are associated with relationship gate, and restricted tensor η, where d is the number of hidden layers. (iv) We select Adam as the optimization function of rdRNN, and Rmsprop completes the optimization and calculation of parameters w d , b d , and η by using zcrosen_F/zw d , zcrosen_F/zb d , and zcrosen_F/zη.

Calculation of Forward Propagation in rdRNN.
In this study, we fully consider the relationship between legal cases and set up a new gate to complete the classification of legal cases, eliminate contingency errors as much as possible, and avoid inconsistencies between the predicted judgment result and the actual case. Relationship control gate r t is used to control the similar relationship between legal cases. r t helps the rdRNN deep neural network make an intelligent judgment by using the judgment results of cases that are similar to the case to be judged. rdRNN can be divided into forward and backward LSTM.
ese networks do not have obvious differences, except for the opposite propagation direction. In the case of rdRNN forward LSTM propagation network, the formal description of relationship control gate r t is as follows: where w r and b r are the weight matrix and bias term of the relational control gate r t , respectively, σ is the activation function, i.e., the sigmoid function, h t−1 is the output unit state of the neuron at time t − 1, and x t is the input value of the neuron at time t.
In the forward LSTM network, the output of each neuron at time t is calculated by the following formula: where r t , f t , i t , and o t are the relational control, forget, input, and output gates, respectively; c t is the unit status of current inputs; net r,t , net f,t , net i,t , and net o,t are the weighted inputs of their corresponding gates at time t; net c,t is the weighted inputs of input state generation function tanh; σ is the activation function, i.e., the sigmoid function, Input: Tensor χ which represents the original legal case data, χ ∈ R I 1 ×I 2 ×···×I K , the restricted tensor η, η ∈ R J 1 ×J 2 ×···×J K , and its weight w η . Output: e core tensor χ which is close to the restricted tensor η in the layer of tensor structure and elements value, χ ∈ R J 1 ×J 2 ×···×J K . // Solving the factor matrix sets C k and D k using Algorithm 1 C k , D k � TSPA(χ, η, w η ); χ � F_RTD(χ, C k , D k ); return χ; ALGORITHM 2: e restricted tensor decomposition method. 10 Computational Intelligence and Neuroscience ; tanh is the hyperbolic tangent function, tanh(x) � (e x − e −x )/(e x + e −x ); w r is the weight matrix of relationship control gate r t , w r � [w rh , w rx ]; w f , w i , and w o are expressed in the same manner as w r ; and b r , b f , b i , and b o are the bias terms of their corresponding activation functions. Subsequently, the unit state of the current moment c t is calculated by f t , c t−1 , i t , and c t . e calculation formula is expressed as follows: (24) e final output of the forward LSTM neural network at time t is calculated by o t , c t , and r t and the similar list of x. It is described as follows: where List(x) is composed of legal cases where the similarity with x is greater than a threshold Max_sim so far. h x 0 refers to the output of the forward LSTM neural network that corresponds to legal case x 0 . Sim(·) is a function that calculates the similarity between legal cases. In this study, we set function Sim as the weight of the Euclidean distance and the cosine distance between legal cases.
where D Euclidean and D cosine refer to the Euclidean distance and the cosine distance between the vectors x and x 0 , respectively. w d is the weight matrix.

Calculation of Backpropagation in rdRNN.
In this section, we describe in detail the backpropagation algorithm of the rdRNN neural network, including the backpropagation of the error along time and the hidden layer. In rdRNN, forward and backward LSTM neural networks have the same principle in the backpropagation algorithm. erefore, this section mainly uses forward LSTM as an example.
Given the error term at time t δ t , δ t � (zcrosen_F/zh t ). Calculation of the backpropagation algorithm of the error term along time is to calculate the value of δ t−1 � (zcrosen_F/zh t−1 ). e full derivative formula shows that

Calculation of the Partial Derivative of Loss Function crosen_F to Restricted Tensor η.
is study proposes a new intelligent method for judging legal cases called RnRTD, which combines rdRNN and RTD to complete the classification of legal cases. In the process of training the RnRTD neural network, a new problem is involved: updating of the value of the restricted tensor η so that it can continuously approximate the tensor value that is most beneficial for improving the classification accuracy of the RnRTD algorithm.
e crux to solving this problem is to calculate the partial derivative of the loss function crosen_F to the restricted tensor η, that is, zcrosen_F/zη. Directly solving the value of zcrosen_F/zη is difficult. We can use the full derivative rule to obtain that e backward propagation formula of rdRNN shows that zcrosen_F/zχ (n) � (zcrosen_F/znet 1 )(znet 1 / zχ (n) ) � δ 1 w 0 . According to equations (19) erefore, the function full derivative rule shows that

Loss Function crosen_F and Softmax
Layer. In Algorithm 3, we use the softmax function softmax to calculate the probability that χ (n) belongs to each type of legal case according to judgment results. Definition 9. Given a set of samples of legal cases and their corresponding outputs of RnRTD (χ (n) , S (n) ) , the probability that χ (n) belongs to each type of legal case is calculated by where L (n) 1q represents the qth element of L (n) 1 . In this study, cross entropy is used as the loss function crosen_F to calculate the error of RnRTD. We define crosen_F as follows: Definition 10. A set of samples of legal cases and their corresponding legal case types (χ (n) , L (n) ) is given. e predicted legal case category of χ (n) is L (n) 1 , which is calculated by RnRTD, and then where N represents the number of samples of legal cases and q represents the dimension of L (n) and L (n) 1 , that is, the number of types of legal cases.

Description of Experimental Data.
We use nearly 1.8 million historical legal cases obtained from a Chinese refereeing study network. ese legal cases involve more than 200 types of accusations, including theft, intentional assault, smuggling, fraud, and deliberate destruction of public property. Approximately 400,000 cases involve theft, and about 200,000 cases involve intentional assault. e number of accusations involved in each case ranges from 1 to 23. Figure 8(a) shows the distribution of various accusations in the legal case data used in this study. e abscissa indicates the accusation index. For example, index 1 corresponds to bribery, and index 2 corresponds to rape. e ordinate indicates the proportion of cases involving the accusation that occupy the overall cases. Figure 8(a) shows that the number of cases involving theft is the highest in the database used in this article. Figure 8(b) shows the distribution of the number of accusations involved in each case. e abscissa indicates the number of accusations involved in the case, and the ordinate indicates the proportion of cases in the corresponding number of accusations. Figure 8(b) shows that the highest number of accusations involves three cases.

Baseline Approaches.
Given that multiaccusation judgment based on deep neural network and tensor decomposition is rarely studied, according to the limited tensor decomposition method RTD and the relation-based recurrent neural network rdRNN, we use the following method for a comparison with RnRTD proposed in this study:

Data Preprocessing.
In this study, the data preprocessing operation can be divided into two parts, namely, the modular representation of legal cases and the construction of the original tensor. Our legal case data preprocessing process can be described as follows: (i) We organize each case in the legal case database into our preestablished case model, which divides the original case file into the defendant's statement, the plaintiff's statement, the content of the public prosecutions allegations, and the court's judgment. (ii) We filter and clean the contents of each module in the legal cases, extract the words that are meaningful to our multiaccusation judgment method, and filter out redundant words, stop words, noisy words, and modal particles. (iii) We train the word-to-vector model to obtain the word vector of the aforementioned vocabulary. en, we obtain a matrix representation of each case module and derive the tensor representation of the entire legal case.

For
Step (1), each case module may be spread across different paragraphs, and cases in different regions have different case descriptions. We extract and integrate them separately to arrive at a modular representation of the cases based on the description rules of case documents in each region. For Step (2), the extraction and filtering of the vocabulary in legal cases often requires professional legal background knowledge; otherwise, error filtering will occur. We filter words in legal case modules by using the legal professional vocabulary and the stop word list. For Step (3), word vectors are the basis for the accuracy of the entire deep neural network method. We use a number of Chinese corpus, such as corpus on Baidu Encyclopedia, Zhihu Questions and Answers, Sohu News, and Sina Weibo, to train the word-to-vector model. e tensor representation of legal cases and the subsequent deep neural network classification method require each case to have the same number of words, and the number of words in 95% of the cases is below 300. erefore, we perform a padding operation for cases where the number of words is less than 300. For cases with a vocabulary number greater than 300, we use the TF-IDF weight of the vocabulary to tailor the case vocabulary.
Input: (χ (n) , L (n) ) , where χ (n) represents the legal cases and L (n) represents the category of legal case corresponding to χ (n) according to judgment results. e size of η, w r , Output: e optimal restricted tensor η, parameters of rdRNN w r , w f , for i � batch_indices to (batch_indices + batch_size) do ALGORITHM 3: e framework of RnRTD algorithm. 14 Computational Intelligence and Neuroscience

Experimental Hyperparameter Setting.
is section describes the hyperparameter settings involved in our proposed method. ese settings include the restricted tensor η and weight matrix W η in RTD and the size of the similar case list in the rdRNN method (i.e., the size of list | * | in equation (25)). e setting of restricted tensor η directly affects the convergence speed and accuracy of RnRTD. Our experiments show that a large rank of restricted tensor η corresponds to a high accuracy of the subsequent deep neural network algorithms. Conversely, a strong linear relationship between column or row vectors in η results in a low accuracy of the subsequent classification algorithms.
In this study, weight matrix W η is used to scale the elements of the last mode in the original tensors. W η adjusts the weights of certain words in the legal cases. For different accusations, the same vocabulary may have different weights in different types of cases. For example, derailment occupies a large and small weight in cases that involve bigamy and smuggling, respectively. e size of the similar case list in rdRNN is an important indicator that determines the impact of the relationship between cases on the final classification result. If the length of the similar case list is set too long, then it is equivalent to strengthening the weak similarity between cases and weakening the strong similarity between cases. Furthermore, if the length of the similar case list is set too short, then it is equivalent to weakening the weak similarity between cases and strengthening the strong similarity between cases. After many experiments, we set the case similar list length to 50.

Experimental Results and Analysis.
is section shows the superiority of the proposed RnRTD method for multiple accusations in legal cases relative to the baseline listed in Section 5.2 and provides the corresponding analysis. Figure 9 shows a series of experimental results based on Bi-LSTM. e abscissa indicates the number of batch iterations, and the ordinate indicates the accuracy of the multiaccusation judgment methods in legal cases. In contrast with the original Bi-LSTM method and Bi-LSTM with only the rdRNN layer, Bi-LSTM with only RTD achieves stable accuracy at the highest speed as the number of batches increases.
e characteristics of RTD are important factors in the aforementioned phenomenon. On the basis of the restricted tensor η, RTD extracts the tensor elements and structure information that are most relevant to the multiaccusation judgment of legal cases from the original tensor. e weight of the vocabulary unrelated to a particular accusation is considerably weakened, and the weight of the vocabulary associated with a particular accusation is strengthened. e tensor dimension is greatly reduced, and the influence of irrelevant vocabulary on the classification algorithm is reduced. Subsequent neural network algorithms continuously iterate and optimize the restricted tensor and continuously adjust and correct the element values of the core tensor. RTD optimizes the original deep neural network algorithm from the lexical level.
In Figure 9, as the number of batches increases, the accuracy of Bi-LSTM with only the rdRNN layer becomes ultimately higher than that of Bi-LSTM with only the RTD layer. e reason is that rdRNN fully considers the similarity between different cases and has better discrimination for similar cases expressed by different language description methods. By setting the appropriate similar case list size, rdRNN fully considers cases that are similar to the current case and weighs their corresponding output states according to their similarity. rdRNN corrects and optimizes the original deep neural network from the case level. Figure 10 shows the experimental results of the TextRNN-based RnRTD method. Similar to what is shown in Figure 9, TextRNN with only the RTD layer has the highest convergence speed as the number of batches increases compared with the original TextRNN and TextRNN with only the rdRNN layer. e accuracy of the deep neural network method with only the rdRNN layer is not always higher than that with   Computational Intelligence and Neuroscience only the RTD layer. Although the rdRNN layer implements the correction and optimization of subsequent classification algorithms at the case level through the setting of similar case lists, the RTD layer also optimizes classification algorithms at the vocabulary level by setting the restricted tensor. Both methods achieve the final accuracy optimization but have different effects for various contexts. RnRTD combines the advantages of RTD and rdRNN to achieve rapid convergence and high classification accuracy. Table 2 provides an experimental comparison of RnRTD methods based on multiple deep neural networks. RnRTD remarkably improves the classification accuracy of original neural networks for the classification of multiple accusations in legal cases. RTD and rdRNN layers also have considerable optimization effects on the original neural networks. RTD is applicable to all deep neural networks and can extract the main information carried by the data at the input layer to realize dimension reduction. rdRNN is an optimization    strategy that is suitable for RNNs. It fully considers the similarity between cases within a certain period and optimizes algorithms at the case level. For algorithms based on convolutional neural networks, we remove the relational control gate in rdRNN while retaining the similar case list. en, the optimization of these algorithms is realized by the rdRNN layer. e convolutional neural network is less effective than RNN because legal case data are time series data. In addition, the attention layer only changes the encoding of the input and does not change the structure of neural networks. For the problem of judgment for multiple accusations in legal cases, the attention layer is still difficult to compensate for due to the lack of timing information of TextCNN and the gradient disappearance and gradient explosion of the TextRNN algorithm. From the perspective of the rdRNN layer, GRU has fewer adjustable parameters than LSTM and Bi-LSTM, and optimization on the restricted tensor is relatively limited. erefore, the Bi-LSTM neural network with RnRTD performs better than other neural network algorithms.

Discussion
In this study, we propose a new method for multiaccusation judgment in legal cases called RnRTD. RnRTD is a multilabel classification method based on tensor decomposition and RNNs. RnRTD consists of the tensor decomposition method with constraints and relation-driven RNN.
We propose a tensor decomposition method with constraints, namely, RTD. We use this method to extract the tensor structure and element information that are most favorable to the subsequent classification algorithm from the original tensor that represents the legal case. RTD continuously corrects and optimizes the values of elements in the core tensor through the weight matrix and restricted tensor; hence, it continues to improve the classification accuracy of the neural network. RTD optimizes neural network classification algorithms at the lexical level. We also propose a relation-driven RNN strategy called rdRNN. Unlike traditional recurrent and LSTM neural networks, rdRNN sets up a new gating switch, that is, the similarity list window. It controls the impact of cases similar to the current case on the output status of the current case. rdRNN optimizes neural network classification algorithms at the case level.
According to our experimental results, the RTD layer and the relation-driven cyclic neural network rdRNN have remarkable optimization effects on various deep neural network algorithms. However, no obvious relationship exists between the two. RTD and rdRNN have their own advantages in different contexts. In Figures 9 and 10, the accuracy of rdRNN is higher than that of RTD. e accuracy of rdRNN is not always higher than that of RTD. RTD achieves stable accuracy the fastest as the number of batches increases in both figures. e reason is the principal component extraction and dimensionality reduction of RTD itself.
RTD is suitable for almost all deep neural networks. It performs principal component extraction and dimensionality reduction on the original data at the input layer. It is similar to traditional principal component analysis methods, such as PCA [20] and SVD [21]. Several decomposition methods [22], such as Tucker and CP [23], are currently used for highdimensional data. ese methods extract the main elements and structural information of the matrix or tensor at the logical level according to the linear relationship of the elements in the matrix or tensor. However, the resulting new matrix or tensor structure is often unexplained. According to the traditional matrix or tensor decomposition method, supervising the completion of the principal component extraction work is difficult. e proposed restricted tensor provides interpretability for the tensor decomposition operation. Under the influence of the restricted condition tensor, RTD retains the information in the original tensor that is beneficial to the subsequent neural network and removes useless information. For the overall classification algorithm, RTD reduces the weight of weak correlation information and improves the influence of strong correlation information on the classification model. In addition, the subsequent deep neural network algorithm will continuously update and optimize the constraints in RTD to guide the core tensor to retain information that is conducive for the classification model. RTD optimizes the classification model at the vocabulary level by combining the weight matrix with the restricted tensor. rdRNN fully considers the similarities between cases and uses it as a factor that influences the output status of the current case. rdRNN optimizes the entire classification model at the case level. Generally, different regions may use different legal case description vocabularies, and rdRNN sets the output status of similar historical cases within a certain period as the reference value of the current case output state by setting a similar case window. Moreover, it sets the weight according to the similarity. RnRTD combines RTD and rdRNN to optimize the classification results from the perspective of case and vocabulary. When we use rdRNN to optimize algorithms based on the convolutional neural network, we remove the relationship control gate, retain only the similar case list in rdRNN, and realize the optimization operation of the rdRNN layer on the neural network.

Conclusion
In this study, we propose a new method for judging multiple crimes in legal cases, namely RnRTD. RnRTD consists of RTD and rdRNN. RTD is a tensor decomposition method with constraints. RTD decomposes the original tensor that represents a legal case into a core tensor under the guidance of restricted tensor. e resulting core tensor represents the main tensor structure and element information that is most favorable for improving the accuracy of subsequent classification algorithm. We propose the rdRNN algorithm and train it using obtained core tensors. rdRNN guides the tensor decomposition process in RTD by continuously optimizing the restricted tensor and finally makes RTD develop in the direction that is most beneficial to improve the classification accuracy of rdRNN. Nevertheless, this study has several problems. For example, even with the RTD tensor decomposition layer, algorithms based on RNNs usually run very slowly. In our future work, we will attempt to reduce the computational complexity of the algorithm and increase its speed.
Data Availability e legal cases data after processing used to support the findings of this study are currently under embargo while the research findings are commercialized. Requests for data, 6/ 12 months after publication of this article, will be considered by the corresponding author.

Conflicts of Interest
e authors declare that they have no conflicts of interest.