Semantic Matching Efficiency of Supply and Demand Text on Cross-Border E-Commerce Online Technology Trading Platforms

With the innovation of global trade business models, more and more foreign trade companies are transforming and developing in the direction of cross-border e-commerce. However, due to the limitation of platform language processing and analysis technology, foreign trade companies encounter many bottlenecks in the process of transformation and upgrading. From the perspective of the semantic matching efficiency of e-commerce platforms, this paper improves the logical and technical problems of cross-border ecommerce in the operation process and uses semantic matching efficiency as the research object to conduct experiments on the QQP dataset. We propose a graph network text semantic analysis model TextSGN based on semantic dependency analysis for the problem that the existing text semantic matching method does not consider the semantic dependency information between words in the text and requires a large amount of training data. The model first analyzes the semantic dependence of the text and performs word embedding and one-hot encoding on the nodes (single words) and edges (dependencies) in the semantic dependence graph. On this basis, in order to quickly mine semantic dependencies, an SGN network block is proposed. The network block defines the way of information transmission from the structural level to update the nodes and edges in the graph, thereby quickly mining semantics dependent information allows the network to converge faster, train classification models on multiple public datasets, and perform classification tests. The experimental results show that the accuracy rate of TextSGN model in short text classification reaches 95.2%, which is 3.6% higher than the suboptimal classification method; the accuracy rate is 86.16%, the F1 value is 88.77%, and the result is better than other methods.


Introduction
Traditional text similarity research methods mainly use onehot, bag-of-words model, N-gram, and TF-IDF as the feature vector of the text and use methods such as cosine similarity as an index to quantify the degree of text similarity. However, these methods simply use the statistical information of the text as the feature vector of the text [1] and fail to consider the context information of the word. At the same time, there are problems of feature sparseness and dimensional explosion in feature extraction. With the development of deep learning [2], the use of deep learning methods to study text similarity tasks has become the mainstream method today. Yang et al. proposed the word2vec word vector embedding method in the article; as a neural network language model, this method converts words into multidimensional vector representation, which greatly facilitates the follow-up work [3]. Radionova-Girsa and Lahiža proposed the GloVe word vector embedding method in the article. The word vector embedding considers the context information, more accurately expresses the context information of the text, and has good performance in multiple natural language processing tasks [4]. Li et al. proposed the TextCNN model in a paper published in 2019 and applied CNN to the field of natural language processing for text classification tasks [5].
Tenyakov and Mamon save more information in the form of vectors and at the same time train the capsule network with a dynamic routing mechanism [6], which reduces the parameters of the network and has a good effect on the handwritten digit recognition dataset [7]. Sun [16]. The three types of new business types are cross-border e-commerce, market procurement, and external comprehensive services. They have different structures, origins, characteristics, principles, and regulatory service systems [17]. It can be seen from comparison that the new business format is the result of the recombination process of foreign trade and domestic trade and the change of the division of labor by various agents, and it has multiple development possibilities [18]. The current new business formats have problems and risks such as erosion and crowding, short-term profit-seeking, and lack of systems [19]. The next development strategy should be appropriate to the overall score and promote the comprehensive management of various business types, the innovation of cross-border e-commerce models, the iteration of market procurement mechanisms, the transformation and development of external comprehensive services, and the rapid development of information technology combined with wide-area interconnection. According to different participants, it can be divided into three types: enterprise-enterprise (B-B), enterprise-consumer individual (B-C), and individual seller-individual buyer (C-C). Among them, under the B-C type, there are three modes: retail import and export, "haitao," "purchasing," and "overseas warehouse" export [20]. The retail import and export model can be subdivided into four specific models according to its flow and whether it is bonded or not [21]. It can be seen that the model classification of cross-border e-commerce is extremely complicated, and the regulatory codes and flow identifications of various models are completely different [22].  [23]. At the same time, the traditional capsule network is improved, words that have nothing to do with text semantics are regarded as noise capsules, and smaller weights are assigned to reduce the impact on subsequent tasks [24]. The text semantic matching method first uses the pretrained GloVe model to map the two texts into a 300dimensional word vector matrix [25]. The word vector matrix is used as the input of the model, and weights are given by the attention mechanism module, and then, the results are input into the BiGRU network and the capsule network model, respectively. In the capsule network, the convolution operation is first performed, and the capsule convolution operation is performed through the main capsule layer. After the squeeze function operation, it is used as the output of the main capsule layer. After the dynamic routing protocol mechanism is calculated, it is connected to the classification capsule layer to classify the capsule [26]. The output result of the layer is expanded as a local feature vector of the text. In the BiGRU network, a bidirectional GRU network is used to extract text information from two directions to obtain the global feature vector of the text. At the same time, in the feature vector extraction stage, a twin neural network structure is used, that is, two word vector matrices are processed using the same network structure so that the two word vector matrices are encoded into the same vector space.

Wireless Communications and Mobile Computing
Finally, the respective local features and global features of the two texts are, respectively, analyzed for similarity, and the similarity matrix of the two texts is obtained. The similarity matrix is used as the input of the big data network. The last layer of the big data network uses the SGD function as the classifier to determine whether the two texts are similar [27].
(1) Word Vector Embedding Module. The word vector embedding module first preprocesses the text, including removing stop words and special symbols [28]. Through analysis of all texts, this experiment chooses a maximum sentence length of 25 characters and completes sentences with less than 25 characters. For sentences with more than 25 characters, the first 25 characters are cut off as the sentence. The GVe model pretrained by the Natural Language Processing Group of Stanford University was used to convert each text in the text into a 250-dimensional word vector: P represents the number of times the word appears in the corpus, and represents the probability of the word appearing in the context of the word [29]. Assuming that the word vector of the word sum is known, the similarity is calculated [30]. When the difference is small, it is proved that the word vector and the cooccurrence matrix are more consistent, and the word vector can accurately grasp the context information: Use cost value to represent the difference between two items, and is the deviation item. By iteratively changing the word vectors of all words, the cost value is the smallest in the entire corpus, that is, the optimal word vector of all words in the corpus is obtained so that the word vector of the word is calculated through the context information [31]. The dataset contains a large amount of English text, and the word vector obtained by pretraining contains more accurate context information. The training results of 50-dimensional, 100dimensional, 200-dimensional, and 300-dimensional word vectors are released. This paper uses the 300-dimensional word vector issued by the Natural Language Processing Group of Stanford University as the word vector representation.
(2) Feature Matrix Extraction Module. With attention in natural language processing, the traditional attention model mainly analyzes the words in the text that are more relevant to the task, so as to give higher attention. Such an attention model will be better in processing a single sentence task which performed. But for the task of this article-text similarity, the main concern is whether the two texts are similar. For the two input text t 1 and t 2 , more attention should be paid to the similar part of t 1 and t 2 , and more attention should be paid to the similar part. Calculate and sum the similarity between any word in t 1 and all words in t 2 . The similarity calculation method uses cosine similarity, and the sum of cosine similarity is used as the value of the weight to describe the word. Suppose that the word vector matrix obtained by text t 1 and t 2 through the word vector embedding layer is According to the above matrix, the semantic analyzer combines the matching degree algorithm to obtain the matching degree: According to the word vector matrix of the previous texts 1t and 2t, the cosine similarity calculation formula is used to calculate the degree of similarity between all words of the two input texts and the other text.
where k is the sum of the cosine similarity between the ith word in the text t 1 and the text t 2 each word and the cosine similarity of each word in the text t 1 and t 2 is calculated and used as the value for calculating the weight of each word. Use k t1 , k t2 and SoftMax functions to complete the calculation of word weights.
A t1 , A t2 are the weight corresponding to each word of the text; multiply the word vector of the word and the corresponding weight to obtain the feature matrix of the text, which is used as the input of the subsequent network.
The capsule network has a large number of articles, conjunctions, interjections, and other words unrelated to the semantics of the text in the text. These words have a high probability of coexisting in the two texts. These words can be high after the attention module is calculated. However, these words do not significantly affect the semantics of the text. Assigning a greater weight will have a certain impact on the final result. These unrelated words are called noise capsules in the capsule network module. Use the NLTK tool to tag the words in the sentence. In the capsule 3 Wireless Communications and Mobile Computing network, the qualifiers, conjunctions, interjections, and pronouns are first assigned lower weights according to the word parts to reduce the impact of the noise capsule on the subsequent tasks and solve the above problems. Input the characteristic matrix of the attention mechanism into the capsule network, and use the dynamic routing algorithm to calculate the output of the upper layer capsule. The calculation steps are as follows: Iterate r times: Return v j : where u i is the feature vector obtained by the mutual attention module, A i is the feature vector after reducing the weight of the noise capsule, r is the number of iterations of the dynamic routing algorithm, w ij is the weight matrix between the two layers of capsules, is the coupling coefficient, c ij is the lower layer capsule i activates the possibility of the upper capsule j, ðj | iÞu is the input of the upper capsule, squash is the activation function, and v j is the output of the upper capsule. The dynamic routing algorithm sets the initial value to 0. Such an initial value is the mean value of b ij , which is updated through iteration to update the value of c ij . For the neural network model u ðj|iÞ parameters, the model learns values through a large amount of training data. The capsule network proposed by Sabour in the article includes a three-layer structure, namely, convolutional layer, primaryCaps layer, and DigitCaps layer. In the method proposed in this paper, the output of the Digit-Caps layer is used as the local feature matrix of the text. Bidirectional gated recurrent unit network (BiGRU) is a bidirectional gated-based recurrent neural network, which is composed of forward GRU and backward GRU. The text is traversed over the network in two directions to get information, including the text context. This solves the problem that the GRU model can only contain the above information. The GRU model is a variant of the long short-term memory network (LSTM). Compared with LSTM, the GRU model has a simpler network structure, but the effect is basically the same as that of LSTM, which greatly reduces the time required for network training. The output of the current time step of the cyclic neural network is related to the output of the previous time step, which makes the cyclic neural network memorable and suitable for processing sequence data. The GRU network merges the input gate and the forget gate in LSTM, called x is the next input, h t−1 is the suspension of the previous import, h t is the candidate state at the current moment, W 0 is the hidden state at the current moment, and y t is the output at the current moment. In the GRU network, information can only be transmitted in one direction, but in practice, each word may have a dependency relationship with the word in the context. Using the BiGRU network to train text through the network in two directions makes the model more effective. The method proposed in this paper uses the output of the BiGRU network as the global feature matrix of the text. The local feature matrix and the global feature matrix of the two texts are, respectively, calculated for similarity, and the similarity matrix 1E of the local features and the global similarity matrix 2E are obtained. The calculation method of 1E and 2E is the same; here is the calculation method of 1E. Assuming that the local features of the two texts are 1S and 2S, respectively, the calculation formula of 1E is as follows: E ij 1 is the element in the ith row and jth column of the similarity matrix, S i 1 is the ith row of S 1 , and S j 2 is the jth row of S 2 . After the similarity matrix is obtained, the two similarity matrices are flattened and connected. The fused similarity vector is used as the input of the fully connected layer, and the output of the fully connected network is connected with the sigmoid classifier. Use the sigmoid classifier to determine whether two texts are similar.

Evaluation Model of Semantic Matching Efficiency of
Supply and Demand Text. The text classification methods currently proposed are mainly divided into two categories: traditional classification algorithms and classification algorithms based on deep learning. Traditional classification methods use feature engineering and feature selection to extract features from original documents and then input the extracted features into classifiers such as SVM and KNN for training and prediction. More classic feature extraction methods include frequency method and mutual information method (PMI), inverse text frequency index (TF-IDF), and N-gram. With the popularity of deep learning, more and more people use deep learning methods to classify text, mainly as convolutional neural network (CNN) and its improved version of the application, such as TextCNN training word vectors to represent text, and CNN local relevance feature is applied to text classification problems; the method proposed on the basis of TextCNN does not dig out the potential semantic relationship between words in the text from the semantic level when processing the text [32] and directly represents the internal meaning of the text. In recent years, graph convolutional neural network (GCN) has attracted widespread attention in the academic community as an emerging research direction. GCN is an extension of CNN in the irregular domain and is mainly used to process irregular graph structure data. The CRF classifier model and the neural network classifier model have their own advantages and disadvantages [27]. The CRF model needs to manually annotate the corpus information in advance and manually design the features such as the part of speech and degree of the word, while the neural network model can learn the training data to

Wireless Communications and Mobile Computing
automatically generate feature vectors to achieve better results. However, neural network models often require longer training time, and some outputs of neural network models are illegal in named entity recognition. Therefore, it is necessary to use CRF to subsequently add the rules of named entities to the sequence labeling process. This paper combines the characteristics of CRF and neural network models to obtain a joint model with more advantages in performance. The learning and prediction of the CRF model is performed on multiple features of the sample. The CRF model itself can generate feature vectors and perform classification. This article uses the features extracted by the hybrid neural network as the intermediate quantity to replace the vector value in the original formula. The emission probability in the CRF classifier model refers to the probability that the words in the sequence belong to each sentiment category [33]. The transition probability is the probability from a label class to an adjacent label class. The emission probability of the conventional CRF classifier is generated based on the feature template, but the features automatically collected by the hybrid neural network are used as the emission probability to get better context information.

Online Text Semantic Analysis Research
Model Construction   Wireless Communications and Mobile Computing

Experimental Analysis of Online Text Semantic Analysis Model
In experiment (1), the mainstream models in the deep learning field are selected for comparison experiments, including LSTM, BiLSTM, capsule, GRU, BiGRU, Siamese-capsule, Siamese-BiGRU, capsule-BiGRU, and use the above models to perform experiments. The experimental results are shown in Table 2.
As shown in Figure 1, compared with traditional CNN and LSTM networks, the model proposed in this paper performs better in text similarity tasks. The performance of the GRU network and the LSTM network in the task is basically the same, but at the same network scale, the time required to train the GRU network is much less than training the LSTM network.
As shown in Figure 2, by comparing the performance of capsule and Siamese-capsule, BiGRU and Siamese-BiGRU, it is found that compared with BiGRU network, the accuracy of Siamese-BiGRU network has increased by 2.52%, the accuracy rate has increased by 2.99%, the recall rate has increased by 1.31%, and the F1 value increased by 2.19%. Compared with the capsule network, the Siamese-capsule network has an accuracy rate of 1.88%, an accuracy rate of 3.63%, a recall rate of 1.93%, and an F1 value of 1.78%. It can be found that the twin neural network structure can effectively improve the performance of the model. As shown in Figure 3, when comparing this paper with the traditional neural network structure, the settings of the same parameters, such as Batch_size and Epoch, are consistent. Changes in these parameters have a specific effect on the experimental results. Although the effect of this model   Wireless Communications and Mobile Computing at the beginning of the iteration is worse than that of the CNN_LSTM and BiLSTM models, the effect of this model gradually surpassed the traditional models and surpassed them stably in the middle of the day.
As shown in Figure 4, compared with CNN_LSTM and BiLSTM, the accuracy of the G-Caps model is increased by 5.3% and 7.6%, respectively. The model in this paper extracts vector features as effective information and has achieved good classification results compared with traditional network structure models.
The processing effect of the text similarity analysis model based on capsule-BiGRU is shown in Figure 5. Compared with the traditional LSTM model, the accuracy rate has increased by 6.08%, and the F1 value has increased by 4.49%. In experiment (2), the method proposed in this paper is compared with the methods proposed in other papers, and the comparison results are shown in Table 3.
Through comparison, it can be found that compared with the original model, the accuracy of the proposed method is increased by 1.58%, and the F1 value is increased by 3.75%. Compared with the direct comparison model, the accuracy rate is increased by 0.66%, and the F1 value is increased by 1.67%. This model uses a 6-layer stacked BiLSTM network, the model is more complex, and the training takes longer.
Due to the small number of samples in the MRPC dataset, the dropout parameter is adjusted to 0.1, and other model parameters are not adjusted. As can be seen in Figure 6, the model performs better on the QQP dataset because the QQP dataset has a larger number of samples and the model training is more complete, indicating that the performance of the model proposed in this article is more dependent on the number of samples in the dataset. In experiment (3), the number of iterations of the dynamic routing algorithm in the capsule network was     Table 4. Based on the above experimental results, it can be seen that the number of iterations of the dynamic routing algorithm has a certain impact on the capsule network. As the number of iterations increases, the time required to train the model continues to increase. When the number of iterations of the dynamic routing algorithm is set to 3, the model has good performance and the training time is 198 min. After the number of iterations exceeds 3, the performance of the model gradually decreases. In other experiments in this article, the number of dynamic routing iterations of the capsule network is set to 3 to obtain better performance.
As shown in Figure 7, compared with the neural network model, the CRF single model has lower classification accuracy and F value, which proves that there is a real gap between the performance of traditional machine learning methods in sentiment analysis and deep learning. The convergence speed of this model is not much different from that of the CRF single model, and it is better than other models in terms of accuracy and F value, which proves the effectiveness       Table 4.
As shown in Figure 8, compared with the improved neural network Text-CNN and CL_CNN, the model in this paper has a simpler network structure; compared with the rule-based network model CNN-Rule, the model in this paper has no artificial rules when the rules are defined. As shown in Table 5, except for the model in this paper, the accuracy of CL_CNN with the best classification effect is 84.3%. G-Caps is still 0.5% higher than this, which proves that the GRU model proposed in this paper is used to capture the overall contextual semantic information, and then, the feature information model is extracted from the semantic information through the capsule.
As shown in Figure 9, TextSGN has shown good superiority in indicators such as accuracy, recall, and F value and indeed has better results in semantic analysis tasks. The model in this paper has improved classification effect compared to the CNN+BiGRU model. Because the model in this paper uses CRF as the classifier instead of the SoftMax function, it has better accuracy in the processing of abnormal tags, which can effectively promote the performance of the sentiment classifier. Compared with the BiGRU+CRF model, the F value of the model in this paper is improved, and the convergence speed is accelerated. The characteristics obtained by the two kinds of neural networks are more sufficient than a single network, and the accuracy is improved. Experiments show that compared with the fusion model, the model in this paper does have a better effect on sentiment analysis tasks.

Conclusions
In recent years, with the development of deep learning, deep learning has been widely used in the task of text similarity. Because convolutional neural networks and recurrent neural networks have shown good performance in tasks in various fields, they have become the main two neural network model structures. The convolutional neural network can effectively extract the local features of the text by processing the word vector matrix, but the disadvantage is that it cannot consider the context information of the text and sometimes cannot express the true meaning of the text. Therefore, this network structure is memorable. The cyclic neural network is used to complete the text feature vector extraction. The sequence information of the words can be considered, and the text context information can be used to extract the global features of the text. However, for long-distance dependence, the cyclic neural network cannot extract text features well.
Aiming at the task of text similarity, this paper proposes a text similarity analysis method based on capsule-BiGRU. The capsule network can effectively extract the local feature vector of the text. The BiGRU network uses a two-way cyclic network structure to traverse the entire text from two directions, thereby effectively extracting context information to obtain the global feature matrix of the text and make similar the feature matrix of the two texts. Degree analysis is used to determine whether the text is similar. Experiments show that the method proposed in this paper has a better effect on text similarity tasks.
Aiming at the problem that traditional neural network models cannot extract text features well, a text similarity analysis method based on capsule-BiGRU is proposed. This method combines the local feature matrix of the text extracted by the capsule network and the bidirectional gated recurrent unit network. The global feature matrix of the text extracted by BiGRU is analyzed, respectively, to obtain the similarity matrix of the text, and the similarity matrix is merged to obtain the multilevel similarity vector of the two texts, so as to determine the text similarity. The existing capsule network is improved, words that are not related to textual meaning are considered noise capsules, and smaller weights are assigned to reduce their impact on subsequent work. For the task of text similarity, a mutual attention mechanism is added before the text feature matrix extraction. For   Figure 9: Comparison of this model with other models.
the two texts to be analyzed, the word vector is weighted by calculating the similarity between words in one text and all words in the other text, which can more accurately determine the similarity of the text.

Data Availability
This article is not supported by data.

Conflicts of Interest
The author declares no conflicts of interest.