Learning Users ’ Intention of Legal Consultation through Pattern-Oriented Tensor Decomposition with Bi-LSTM

Online legal consultation plays an increasingly important role in the modern rule-of-law society. This study aims to understand the intention of legal consultation of users with different language expressions and legal knowledge background. A critical issue is a method through which users’ legal consultation data are classified and the feature of each category is extracted. Traditional classification methods rely considerably on lexical and syntactic features and frequently require strict sentence formatting, which eliminates substantial energy andmaynot be universally applicable.We aim to extract the patterns of users’ consultation on different categories, which minimally depend on lexical, syntax, and sentence formatting. However, research in this area has rarely been conducted in previous legal advisory service studies. In this study, a classification approach for multiclass users’ intention based on pattern-oriented tensor decomposition and Bi-LSTM is proposed, and each user’s legal consulting statement is expressed as a tensor. Moreover, we propose a pattern-oriented tensor decomposition method that can obtain a core tensor that approximates the patterns of users’ consultation.These patterns can improve the accuracy of classifying users’ intention of legal consultation. We use Bi-LSTM to automatically learn and optimize these patterns. Evidently, Bi-LSTM with a pattern-oriented tensor decomposition layer performs better than a recurrent neural network only. Results show that our method is more accurate than the previous work, and the factor matrix and core tensor calculated by the pattern-oriented tensor decomposition are interpretable.


Introduction
With the increase in demand for online legal consultation [1], understanding the intention of different users for legal consultation is a problem that must be solved [2].Different users have various language expressions and levels of legal knowledge [3]; for example, User A inquired as follows: "He sneaked into my house and stole three thousand dollars, how to judge?", and User B asked as follows: "Burglary $2500, how many sentences and fines should be sentenced?"Users A and B described burglary cases amounting to $3000.These users expressed the same intention of legal consultation and could be provided with the same category.Thus, a crucial step to understand the users' legal counseling intentions is classifying users' legal consulting statements.Traditional intention classification methods extract sentence features, which rely heavily on lexical and syntactic characteristics, and generally require sentences to have a strict format.However, users' legal advice data, such as colloquial, disordered, and unprofessional data, are typically disorganized, thereby resulting in numerous difficulties for traditional methods of users' intent classification.
In previous works, understanding users' intent of legal consultation has been rarely accomplished, especially classifying users' intent upon the colloquial, unprofessional, and disordered irregular legal consultation dataset [4].Figure 1 illustrates the framework of the intent classification model of users' legal consulting statements.Evidently, the problems to be solved mainly include modeling and classifying user legal consulting statements.Traditional intention classification methods are dedicated to feature extraction at the lexical and syntactic layers, and regular datasets with professional knowledge background are typically required to achieve high classification accuracy [5].Obtaining these datasets requires expert knowledge and consumes substantial human engineering.
Definition 1 (the intention of the users' legal consulting statement).The intention of the users' legal consulting statement is category of consultation involved in it, such as process, assistance, crimes, and judgments on legal cases.
We define the intention of users' legal consulting statements as Definition 1.This article formalizes the problem of understanding users' legal intention as Problem 2. This study proposes a new method for understanding users' intention of legal consultation.In terms of modeling methods for user legal consulting statements, we propose a pattern-oriented tensor decomposition method.We focus on extracting the patterns of user legal consultation, rather than features in the lexical and syntactical levels for different categories.These patterns can be regarded as a kind of data structure and are less dependent on vocabulary, grammar, and sentence formatting than traditional intention classification methods.The pattern-oriented tensor decomposition method is used to extract structured information in a users' legal consulting statement, and the structured information approximates the user legal consulting patterns.For example, we denote the user legal consulting statement as the tensor  and the user consultation pattern derived by Bi-LSTM as ].Then, we use  and ] as inputs of the pattern-oriented tensor decomposition method and obtain a core tensor , which is construed as the structured information of tensor  and approximated to pattern ].  carries not only the vocabulary and syntax data but also the structured information of .
In terms of classification model optimization, this study proposes a user legal consulting intent classification method on the basis of Bi-LSTM and pattern-oriented tensor decomposition.We use Bi-LSTM to automatically learn and optimize users' legal consultation patterns and obtain patterns that are highly favorable for classifying users' legal consultation intention.Moreover, Bi-LSTM, which passes the pattern-oriented tensor decomposition layer, is more accurate in classifying users' consultation intent and more relaxed on the datasets than directly using the users' legal consulting statement tensor as the input to Bi-LSTM.Furthermore, the core tensor obtained by the pattern-oriented tensor decomposition method contains structured information that approximates the legal consultation patterns of different categories.Simultaneously, the core tensor dimension is considerably lower than the original one.The core tensor can be regarded as the main structured information of the original tensor for user intention classification.
The main contributions of this study are summarized as follows: (i) This paper is organized as follows: Section 2 mainly describes the research procedure on classifying texts in the legal field in recent years and related works on intent recognition.Section 3 mainly introduces related background knowledge, such as several relevant methods, definitions, and notations.Section 4 details the method proposed in this study for user legal advice intent, that is, Bi-LSTM with patternoriented tensor decomposition method.Section 5 presents the relevant comparison experiments and result analysis.

Related Work
In recent years, research on the understanding of user intent based on deep neural networks and tensor layers in the field of legal services has been rarely conducted.In the past ten years, researchers have concentrated on investigating the classification of legal related documents in the field of law and computer intersection [6].Text classification in the legal field includes classification and understanding of legal cases, judgment documents, entities involved in legal cases, and laws and regulations.
In [7], Sulea, Zampieri, and Malmasi studied the application of text classification in the legal field for professionals.The authors proposed a method for predicting the judgment of the Supreme Court in France on the basis of machine learning algorithms and statistics and suggested an accurate case-like case retrieval technique and weight fluctuation algorithm for case influence over time.The SVM algorithm was mainly used to complete the classification of relevant documents in legal cases, and the judgment for the legal cases was realized.In [8], Sarwar, Karim, and Naeem studied the software copyright dispute between a user and a software program owner through a semisupervised machine learning algorithm; in addition, the authors predicted and judged the software copyright dispute that may be violated by the software after the user obtained the software license.Copyright disputes in using software licenses are common problems.After users obtain software licenses, they can use the software for a period in accordance with software usage rules.
In [9], Galgani and Hoffmann proposed a method for classifying legal references through incremental knowledge acquisition.This method can be automated to extract the main objectives from the legal text summary.These authors created considerable training and test corpora for legal citation classification in the legal field of Australian court judgment report, which is considered of high quality under Australian law.A specialized legal knowledge base in the field, which uses machine learning algorithms, is utilized to classify legal references.In [10], Xiong studied the automatic classification system in the field of Chinese legal texts.For Chinese legal documents, traditional Chinese character documents cannot be used to model Chinese legal documents.Otherwise, dimensional explosion and computational complexity will heighten.Xiong proposed a legal document clustering method on the basis of latent semantic analysis to diminish the dimension of legal text features.In addition, Xiong established a Chinese taxonomic automatic classification system in accordance with the second dimensionality reduction method based on the foundation of latent semantic analysis.
In [11], Maat, Krabben, and Winkels used machine learning algorithms to classify sentences in the Dutch legal library and compared the results of the classification with the legal sentence classification outcomes on the basis of traditional pattern classifiers.The legal sentence classifier based on machine learning algorithm has higher accuracy than the pattern-based classifier given the accurate modeling of legal sentences and feature extraction.In [12], Bartolini proposed a management labeling system for Italian law.The method aims to cluster the full text by representing redundant long documents in the vector form and achieve document classification.It uses the treaty and article as clustering units and presents clustering experiment results in a tree diagram form.
In the general text classification field, researchers have conducted substantial research [13].Traditional machine learning algorithms and deep neural networks are used in text classification.From the perspective of machine learning, Nigam, McCallum, and Thrun improved the accuracy of learning text classifiers by using considerable unlabeled documents to augment few marked documents [14].This method is necessary because obtaining text labels for text classification is costly in practice.However, considerable unlabeled documents are particularly easy to obtain.Their article uses an EM-based approach to learn and mark unlabeled documents.The algorithm first uses a Bayesian classifier to probabilistically mark unlabeled documents, followed by a small amount.Subsequently, the system counts the expected values of the tagged document, creates a tag classifier on the basis of all documents, and iterates until it converges.From the perspective of deep neural networks, Kim proposed TextCNN that is based on convolutional neural networks for text classification and prediction [15].Donahue proposed a structure based on recurrent neural networks for text classification, that is, TextRNN [16].Dzmitry proposed an attention structure for deep neural networks.Attention layer discovers the association between input and output by adding weight parameters [17].

Preliminaries
In this section, we introduce several related methods, definitions, and notations.Section 3.1 presents the basic definitions and notations involved in this study.Section 3.2 provides a detailed explanation of the tensor decomposition operation.

Definitions and Notations.
In this study, we present user legal consulting statements in tensors.The patternoriented tensor decomposition method is used to decompose these tensors, and the obtained core tensors are used in the subsequent deep neural network classification model.A tensor is a data structure similar to a vector or matrix [18,19].Tensor decomposition is a dimensionality reduction operation on the tensor [20,21].Similar to principal component analysis and singular value decomposition methods, tensor decomposition methods are devoted to extracting the main structure and compositional information in the original tensor [22,23].
The tensor is actually a multidimensional array [24], and we use the Euler script letters () to represent the tensor.We refer to the tensor dimension and number of tensor dimension as modes and order, respectively [22].Scalar, vector, and matrix are denoted in lowercase (), bold lowercase (a), and uppercase letters (), correspondingly; the transposition of matrix  in   ; and unit matrices in .A square matrix with elements of 1 is represented by 1.

Definition 3 (outer product). The outer product of two vectors
Definition 6 (-mode product).Given -mode tensor  ∈ R  1 × 2 ×⋅⋅⋅×  and matrix  ∈ R   × , the -mode product is denoted as Definition 7 (-mode matricization).Given an -mode The calculation method aims to fix the th mode and form the elements of other modes into a long matrix.

Tensor Decomposition.
Tensor decomposition is a process of approximating a tensor into a core tensor and several factor matrices [25].In Figure 2, given an -mode tensor  ∈ R  1 × 2 ×⋅⋅⋅×  , the formal expression of tensor decomposition on  is where {  } is a set of factor matrices,   ∈ R   ×  .The factor matrices are all column orthogonal ones [19].Furthermore,  is the core tensor,  ∈ R  1 × 2 ×⋅⋅⋅×  .The tensor decomposition methods minimize the objective function  [26], where

Our Approach
This paper proposes Bi-LSTM with pattern-oriented tensor decomposition method for intention classification of users' legal consulting statements.In Section 4.1, the patternoriented tensor decomposition method extracts the core tensor χ from the original tensor  under the guidance of the pattern tensor ], in order to make χ approximate ].In Section 4.2, Bi-LSTM continually optimizes pattern tensor ] so that χ carries a specific tensor structured and elemental information in .This information is most conducive to improving the accuracy of the intent classification model of users' legal consulting statements.As shown in Figure 3, Bi-LSTM controls the process of pattern-oriented tensor decomposition by optimizing pattern tensor ].Bi-LSTM continually optimizes pattern tensor ], while core tensor χ continues to approach ] through the pattern-oriented tensor decomposition method.Finally, ] becomes the pattern tensor that can make the Bi-LSTM model reach high accuracy, and χ is the core tensor that is beneficial for improving the accuracy of the subsequent classification model under the guidance of tensor pattern ].

Pattern-Oriented Tensor Decomposition Method.
The pattern-oriented tensor decomposition method decomposes tensor  into core tensor χ and factor matrices {  } and {  }, thus making the core tensor χ approach the users' legal consultation pattern ], that is, the core tensor χ and the users' legal consultation pattern ] demonstrate a similar tensor structure.The subsequent Bi-LSTM classification model controls pattern-oriented tensor decomposition by continuously optimizing the pattern tensor ].This situation implies that the core tensor χ is more advantageous than users' legal consultation data tensor  in terms of enhancing the accuracy in classifying users' legal consultation intention.Simultaneously, χ achieves the dimension reduction effect, which considerably reduces the calculation time and space.
The framework of the pattern-oriented tensor decomposition method is depicted in Figure 4.In this study, the problems to be solved by the pattern-oriented tensor decomposition method are defined as Problems 9 and 10.Problem 9. Given a tensor  ∈ R In Problem 10, we can calculate the value of χ by setting the partial differential of function  with respect to χ to 0. The specific conclusion is presented in Theorem 12.In Appendix A, Proof A.0.1 provides the proof of Lemma 11, and Proof A.0.2 provides the solution of (11) in Theorem 12.

Lemma . Given that the Frobenius function
The following part is the process of calculating the sets of factor matrices {  } and {  } in Problem 9.Under the constraint of Conditions 1 and 2, we can calculate the optimal solution of function  by using alternating least squares (ALS), Lemma

Lemma . Given the function 𝛿 = 𝑡𝑟((𝜒∏
and   satisfies ( 9), the partial differential of function  to   0 , where , where  is a constant.
The ALS algorithm aims to use the partial derivative of the remaining variable while fixing other variables and find the value of the variable when the partial derivative is zero.Then, the value is substituted for the original objective function.
Similarly, the values of other variables are calculated using the same process.The ALS continuously iterates until the calculation error is tolerable.The process of calculating the optimal solution {  } and {  } that minimize target function (8) under constraint Conditions 1 and 2 is demonstrated in Proof B.0.4.
Algorithm 1 demonstrates the process of the patternoriented tensor decomposition method.The present study uses the ALS algorithm to optimize the parameters involved in Algorithm 1.The input of Algorithm 1 is the tensor  that represents users' legal consulting statement and the user legal consultation pattern ], which is beneficial to classifying users' legal consultation intention.The outputs of Algorithm 1 are the core tensor χ and the corresponding factor matrices {  } and {  }.In addition, χ can be interpreted as a feature map of the original tensor  in the space determined by core tensor ].That is, the original users' legal consulting statement is mapped to the feature space that is beneficial for classifying users' legal consultation intention.Then, we can accurately understand users' legal consultation intention.
In Line 1 of Algorithm 1, we initialize the sets of factor matrices related to the core tensor χ.Then we use the ALS algorithm to calculate the optimal solution of {  } and {  } that minimize the value of (8) under Condition 2. Furthermore, ℎℎV in Line 2 represents the value of number of iterations we set for the ALS algorithm.Function  in Line 4 embodies the calculation process of (B.3) in Proof B.0.4 of Appendix B. Line 5 completes the SVD decomposition of the transition variable   0 , which corresponds to (B.4) in Proof B.0.4 of Appendix B.Moreover, Line 6 presents the method for calculating   0 that minimizes (8) while fixing {  } where  ̸ =  0 and {  }.Similarly, Line 8 to Line 11 demonstrate the process of calculating   0 to minimize (8) while fixing {  } where  ̸ =  0 and {  }.In Line 14, function  represents the calculation of χ which is the core tensor of users' legal consulting statement .χ is interpreted as a result of the tensor decomposition of  directed to users' legal consultation pattern ].

Optimization Method of Users' Legal Consulting Pattern.
In this study, we use the Bi-LSTM [27] to optimize users' legal consultation pattern ] and ensure that the final calculated ] is a favorable users' legal consultation pattern for the classification model of users' legal consulting intention.Notably, the optimization function of Bi-LSTM is Rmsprop.The following section presents the training process of Bi-LSTM using Rmsprop as its optimization function: (i) We use the initial user legal consultation pattern ] and the core tensor set { χ() } which represents user's legal consulting statements as the input of Bi-LSTM.Each core tensor χ() in the core tensor set { χ() } is the result of the pattern-oriented tensor decomposition method while using the corresponding original tensor  ()  and the user legal consultation pattern ] as the input.χ() approaches the user legal consultation pattern ] on the layer of tensor structure.
(ii) In this study, the output {ℎ ()  } of Bi-LSTM is used as the input of the softmax layer to realize the mapping of output vectors to categories { () } of users' legal consulting statement { () }, and  is the number of hidden layers.Moreover, the cross entropy is used as the loss function  for calculating the error.
(iii) By propagating the forward and reverse between LSTM units, using the formulas of the error backpropagation in Bi-LSTM over time, and the error inverse propagation between the hidden layers of Bi-LSTM, we calculate the partial derivative of the loss function  with respect to the weight matrix {  }, the bias term {  }, and the users' legal consulting pattern ], that is, /  , /  , and /], correspondingly.
(iv) The Rmsprop optimization function is used to continuously optimize and iterate the abovementioned parameters {  },{  }, and ] using the value of /  , /  , and /], correspondingly.Finally we obtain the value of the weight matrix {  }, bias term {  }, and users' legal consultation pattern ].These parameters are favorable for users' legal consultation intention classification model based on Bi-LSTM.

Method for Calculating the Partial Derivative of 𝐿𝑜𝑠𝑠𝐹
to ].This study proposes a method for solving the partial derivative of the loss function  with respect to users' legal counseling pattern ].Directly calculating the partial derivative of  to ] is difficult.However, we can indirectly determine the partial derivative of the loss function  on users' legal consultation pattern  by using the total and indirect derivative rules.
In this study, we use tensor  which represents users' legal consulting statement, the factor matrices {  } and {  }, and the core tensor χ which approaches users' legal consultation pattern ] on the layer of tensor structure as transition variables.χ, {  }, and {  } are calculated through the pattern-oriented tensor decomposition method with  and ] as its inputs.The transition variables , χ, {  }, and {  } transform the derivative problem /] into the Sylvester problem, we use the Hessenberg-Schur algorithm to solve the Sylvester matrix equation, and finally the partial derivative of loss function  with respect to users' legal consulting pattern ] is obtained.
tensor decomposition method with { () } and ] as its inputs, the partial derivative of loss function  with respect to the users' legal consulting pattern ] is obtained using where  χ() = s/ χ() and  () ) V .Functions ( ()  ) V and ( ()   ) V meet the following limitations.  ()  ( ()  /]), which is the classic Sylvester equation and can be calculated using the Hessenberg-Schur algorithm.

𝑓𝑢𝑛𝐶 (𝐶
Algorithm 2 demonstrates the optimization method for users' legal consulting pattern in this study.In Line 2,   represents the training times of the Bi-LSTM used.Lines 4 to 12 are the training steps for the Bi-LSTM model.In Line 6, ℎ  represents the number of samples per small batch training.Function  in Line 7 denotes the patternoriented tensor decomposition method.Line 8 presents the forward propagation process in Bi-LSTM on forward and backward layers.Line 9 elucidates the backpropagation process of errors over time and neural network layers in Bi-LSTM.We use Rmsprop in Line 11 as an optimization function to optimize the parameters in Bi-LSTM.

Loss Function 𝐿𝑜𝑠𝑠𝐹 and Softmax Layer.
In Algorithm 2, we use function  to calculate the probability that  () belongs to each category.For intent classification of users' legal consulting statements, the function above is defined as follows.
Definition 15.Given a set of users' legal consulting statement samples and their corresponding outputs of Bi-LSTM {( () ,  () 1 )}, the probability that  () belongs to each category is calculated using where  () 1 represents the th element of  () 1 .
In this study, the cross entropy is used as the loss function  to calculate the error of Bi-LSTM.We define  as follows.
Input: {( () ,  () )}, where  () is the user's legal consulting statement and  ()  Definition 16.Given a set of users' legal consulting statement samples and their corresponding categories {( () ,  () )}.The estimated category of  () is  ()  1 , which is calculated using Bi-LSTM; then where  represents the number of samples of users' legal consulting statements. denotes the dimension of  () and  () 1 , that is, the number of categories.

Empirical Results
We provide the results of the deep learning model with pattern-oriented tensor decomposition proposed in this study on actual datasets.The experiment verifies that the Bi-LSTM model with pattern-oriented tensor decomposition can accurately classify and understand users' legal consulting sentences and intentions comprehensively.Bi-LSTM with a pattern-oriented tensor decomposition layer is more efficient, interpretable, and instructive than traditional recurrent neural networks.

Data Description.
The data used in this study are mainly online users' legal consulting statements.Our main data sources include the China Legal Business Consulting website and various public legal consulting service platforms at the local level.In this study, approximately 150,000 legal consultations are obtained from all over China from 2008 to 2018.
These data have been manually labeled under the professional legal background and divided into 28 categories, including various common legal disputes, such as divorce and contract disputes, property transfer, and loan compensation.Moreover, this study conducts a rigorous statistical analysis of the collected datasets and discovers certain interesting data.From 2008 to 2018, the public online legal advice issues are mainly concentrated on labor and personnel, civil, contract, and property disputes; marriage relationship; and creditor's rights debt.
Figure 5(a) displays the distribution of different categories of partial users' legal consulting statements.Evidently, the legal disputes that people aim to solve through online legal consultation have evident tendencies, mainly in the civil aspects, such as marriage and loan disputes and property division.By contrast, major or extraordinarily serious criminal offenses are extremely rare.

Baseline Approaches.
In order to understand the intent of users' legal consulting statements, we proposed Bi-LSTM with pattern-oriented tensor decomposition method.However, in previous studies, research on the understanding of users' legal consulting intention using deep neural networks and tensors has been rarely mentioned.We establish the experimental comparison works of this study on the following basic points: (i) From the perspective of deep neural networks, we use the latest neural networks for comparison with Bi-LSTM, including TextCNN [15] and TextCNN attention [17]   networks [28], TextRNN [16] and TextRNN attention [17] which are based on recurrent neural networks, and LSTM, GRU [29], Bi-GRU [30].
(ii) From the perspective of the tensor decomposition layer, we use common tensor decomposition algorithms for comparison with pattern-oriented tensor decomposition method, including Tucker and CP tensor decomposition algorithms.
Through the first point above, we show the performance of Bi-LSTM relative to other deep neural networks on intention classification of users' legal consulting statements.Through the second point above, we show the superiority of pattern-oriented tensor decomposition method compared to other unsupervised tensor decomposition algorithms.

Feature Extraction.
In this study, numerous preprocessing operations are performed on the obtained users' legal consulting statements.The preprocessing operation can be divided into two main steps: (i) Module definition of users' legal consulting statement.In this study, each users' legal consulting statement is represented as a three-dimensional tensor.We divide user's legal consultation into five modules, namely, subject, object, motivation, behavior, and consequences of consultation.Each module contains multiple vocabularies and is represented by a matrix of predetermined dimensions.The vertical dimension of the matrix demonstrates vocabularies contained in the module, and the horizontal dimension demonstrates embedding vectors of these vocabularies.
(ii) Quantitative representation of users' legal consulting statement.This study first performs word segmentation on users' legal consulting statements and remove Chinese punctuation marks, stop words, redundant vocabularies, and other basic operations because the data collected are all Chinese users' legal consulting statements.Furthermore, this study divides each users' consulting statement in the dataset into five modules in accordance with the previous step.This study represents the vocabularies in users' legal consulting statements in embeddings, that is, word-embedding operation.On this basis, this study instantiates each module of users' legal consulting statement, vectorizes each vocabulary, and represents each user's legal consulting statement in tensor.
The users' legal consulting statement is represented by a three-dimensional tensor.In Figure 6, the first dimension of the tensor represents modules in the statement.The second dimension represents meaningful vocabulary contained in each module.The vocabularies are derived from the original statement through removing redundant, meaningless, and repeated words.The third dimension represents the word embedding corresponding to each word.
We divide each user's legal consulting statement into five modules, namely, subject, object, motivation, behavior, and consequences of consultation.Each module in the users' consulting statement exhibits multiple entity objects.For example, in the users' legal consulting statement: "Xiao Wang repeatedly threatened me with a knife and took me more than 30,000 yuan.What crime should he sue?", the subjects of the consultation module are "me" and "Xiao Wang".Then, the object of the consultation module is "30,000 yuan".The motivation, behavior, and consequence of the counseling module correspond to "crime", "threatened", and "knife".
The word vector generation model is trained under large Chinese corpus.The Chinese Wikipedia and news from multiple websites, such as Tencent, Sohu, and Sogou, are used as corpora for Chinese word vector training [31].Then we use the word2vec tool proposed by Google to train Chinese word vectors [32].Word2vec converts one-hot vectors in corpus into low-dimensional dense vectors.The word-embedding operations ensure that users' legal consulting statements can be processed using the Bi-LSTM model with pattern-oriented tensor decomposition presented in this study [33].We fix users' legal consulting statements to the same length because tensors representing them must have the same dimensions.The length of the users' legal consulting statement in the database is illustrated in Figure 5(b).Evidently, the number of vocabularies for most users' consulting statements is between 15 and 500, except for a particularly small number of users' legal consulting statements in which the number of vocabularies is higher than 2000 or less than 10.In this study, we set a vocabulary baseline and run users' legal consulting statements with more vocabularies than the baseline.Then, we fill in users' legal consulting statements with fewer vocabularies than the baseline.

Parameter Adjustment and Experimental Settings.
We have implemented a tensor representation of each user's legal consulting statement in the database on the basis of the abovementioned operations.This study uses a TensorFlow development kit to complete the programming of the proposed method.Then, the parameters of the Bi-LSTM model with the pattern-oriented tensor decomposition method proposed in this study are set.
In contrast to the traditional deep neural network algorithms, the important parameters involved in this model include the size of batches while training the neural network and number of layers, neurons in each layer, and iterations of the overall neural network algorithm.Furthermore, these parameters contain users' legal consultation pattern ].The setting of users' legal consultation pattern ] seriously affects the convergence speed and accuracy of the model.Our experiments show that the classification accuracy of users' legal consulting statement is difficult to increase when the structure of users' legal consultation pattern ] is single; that is, column vectors in ] exhibit a linear relationship.
For all neural networks involved in the experiments of this article, including TextCNN, TextCNN attention, Tex-tRNN, TextRNN attention, LSTM, Bi-LSTM, GRU, and Bi-GRU mentioned in Section 5.2, we trained each of them for 10 epochs with a batch size of 60, a hidden layer size of 512, a hidden layer number of 3, and a learning rate of 0.001.We use the TensorFlow development kit to implement neural networks and use Graphics Processing Unit (GPU) to run programs for a fast computing speed.

Experimental Results and Analysis.
We provide experiments on the baseline in Section 5.2 and our approach for intention classification of users' legal consulting statements in this section.Simultaneously, we provide a detailed explanation of the superiority of Bi-LSTM and the necessity of pattern-oriented tensor decomposition.
Figure 7 indicates that neural networks with tensor decomposition layer converge faster and have higher accuracy than that without it.In fact, this phenomenon is determined by the characteristics of tensor decomposition methods.Tensor decomposition algorithms extract the main structure and element information from the original tensor, while removing redundant information which has strong logical correlation.That is, tensor decomposition weakens the influence of vocabularies with weak relevance to the intention of users' legal consulting statements on the classification model and enhances the influence of strong related vocabularies on it.Moreover, the tensor decomposition layer reduces the dimension of original tensors.This greatly reduces the computational complexity of subsequent deep neural networks.Therefore, the tensor decomposition layer makes neural networks converge faster and achieve higher accuracy.
As can be seen from the pink and cyan curves in Figure 7, Tucker and CP tensor decomposition have basically the same optimization effect on neural networks.This is because CP decomposition is a special case of Tucker decomposition.Tucker tensor decomposition is actually a high-order singular value decomposition (SVD).Tucker decomposition uses the SVD algorithm to iteratively extract the main components of each mode in the original tensor and finally figures out a core tensor and its corresponding factor matrix set.When the core tensor is a diagonal tensor, Tucker decomposition evolves into CP decomposition.Core tensors obtained by Tucker and CP decomposition are weakly interpretable.These methods are all unsupervised tensor decomposition methods.For  neural network algorithms, Tucker and CP decomposition are not steerable and autonomous learning.
According to the red curves in Figure 7, we can see that the pattern-oriented tensor decomposition layer optimizes neural networks much more than Tucker and CP decomposition.It allows neural network algorithms to converge faster while achieving higher accuracy.The pattern-oriented tensor decomposition algorithm controls the tensor decomposition process through pattern tensors.This algorithm makes the core tensor extracted from the original one approximate the pattern tensor on tensor structure and elements information.On this basis, neural network classification algorithms affect the process of tensor decomposition by continuously optimizing the pattern tensor.These operations ultimately make core tensors carry information that is most conducive to improving the accuracy of the classification model.Therefore, compared with Tucker and CP tensor decomposition, pattern-oriented tensor decomposition method is more instructive and autonomous learning.Moreover, resulting core tensors are more interpretable.In general, the pattern tensor is a bridge between tensor decomposition and neural networks.
Figure 7(c) demonstrates that TextCNN has lower accuracy than TextRNN, LSTM, and Bi-LSTM in classifying the intention of users' legal consulting statements.This is because the convolution kernel is more concerned with the spatial relationship of input data.Convolutional neural networks only consider the current input while recurrent neural networks consider both the current input and previous inputs.Users' legal consulting statements are sequence data.The Recurrent neural networks are difficult to handle longdistance dependencies.When the input users' legal consulting statement is long, recurrent neural networks may experience gradient disappearance or explosion.LSTM-based neural networks solve the above problem by adding new cell states and gating mechanisms.Bi-LSTM comprehensively considers outputs of the forward and backward LSTM units.Compared with unidirectional LSTM, Bi-LSTM can achieve higher accuracy.
Tables 1 and 2 provide the accuracy and Micro-F1 score of a variety of neural networks for intention classification of users' legal consulting statements.TP stands for tensor decomposition.It can be seen that Bi-LSTM with patternoriented tensor decomposition layer has the highest accuracy compared to other algorithms.From the perspective of sequence coding, attention layer can break the limit of fixedlength inputs and calculate the relationship between input sequences and output sequences.Although attention layer adds a series of weight parameters and learns the weight of each element from inputs and outputs sequences, it does not change the structure inside original neural networks.For the problem of intention classification of users' legal consulting statements, attention layer is difficult to compensate for the missing sequence information of TextCNN and the gradient disappearance or explosion problems of TextRNN.
Tables 1 and 2 demonstrate that the pattern-oriented tensor decomposition layer has a greater optimization effect on LSTM and Bi-LSTM than GRU and Bi-GRU.GRU is a simplification of LSTM.LSTM controls outputs of neural units through the output gate, while GRU passes outputs directly to next neural units.Therefore, GRU converges faster than LSTM.For the optimization of pattern tensors, LSTM is better than GRU.The main reason is that GRU has higher integration and fewer adjustable parameters than LSTM.That is to say, GRU has a relatively limited optimization of pattern tensors.

Conclusion
In this study, we propose a new method (i.e., Bi-LSTM with pattern-oriented tensor decomposition) to solve the problem of users' legal intention understanding in the field of legal services.Our method combines deep neural network with tensor decomposition method to complete the classification and deep understanding of users' legal consulting statements.Our method is more instructive and interpretable than the traditional deep neural networks.We propose a new tensor decomposition method that is driven by users' legal consultation patterns and continuously guide the training and update process of deep neural networks.

Data Availability
The users' legal consulting statement data after processing used to support the findings of this study are currently under embargo while the research findings are commercialized.Requests for data, [6/12 months] after publication of this article, will be considered by the corresponding author.

Figure 1 :
Figure 1: Intent classification model for users' legal consulting statements.

Figure 5 :
Figure 5: Statistics on dataset of users' legal consulting statements.

Figure 6 :
Figure 6: Tensor representation of users' legal consulting statements.
Accuracy of classification on multi-class (b) Experimental results of algorithms based on Bi-LSTM Accuracy of classification on multi-class TextCNN with pattern-oriented tensor decomposition layer TextCNN with Tucker tensor decomposition layer TextCNN with CP tensor The original TextCNN decomposition layer (c) Experimental results of algorithms based on TextCNN Accuracy of classification on multi-class 0 (d) Experimental results of algorithms based on TextRNN

Figure 7 :
Figure 7: Experiments on Bi-LSTM and other neural networks.
Given {  } and {  } in Problem 9, we can obtain the optimal solution of χ that minimizes the target function  in Problem 10.

Table 1 :
Accuracy of algorithms based on multiple deep neural networks.

Table 2 :
F1 score of algorithms based on multiple deep neural networks.