A Novel Joint Entity Relation Extraction Based on Capsule Network and Part-of-Speech Weighting

With the development of science and technology, science and technology policies are increasing year by year. Science and technology policies are literature existing in the form of texts, which are characterized by rigorous structure, clear hierarchy, and standard language. Mining template information from policies can optimize data templates and improve the eciency of recommending data to users. is paper proposes a joint entity relation extraction model based on capsule networks and part-ofspeech weighting. In order to learn more feature information from word vector, capsule network based on bidirectional gated cyclic unit is used to replace the traditional convolutional neural network. In view of the phenomenon of imperfect semantic expression of word vector, part-of-speech features are added to enrich text information. Meanwhile, in order to solve the weight distribution problem of word features and part-of-speech features, an articial sh swarm algorithm is proposed to optimize the two feature weights by iterative optimization, and the eectiveness of the proposed model is proved by experiments.


Introduction
Big data is gradually becoming a new model and element to promote the construction of social development and a modern governance system, which has had a huge impact on national governance, science and technology decisionmaking, and intelligence services. It has become one of the consensus [1] in the eld of intelligence services research to e ectively make use of massive data resources and big data analysis methods and technologies to support decision makers. Science and technology policy is a document in the form of text, which has the characteristics of rigorous structure, clear hierarchy, and standard language. Mining template information from the policy can optimize the data template and improve the e ciency of recommending data to users. rough the extract the entity relation from science and technology policy, the theme of data stored in the form of template database not only save time and manpower and can promote the e cient utilization of data and optimize the user of data viewing experience, so that the support project research direction and the formulation of science and technology policy is of great signi cance.
Entity relation extraction refers to the extraction of relation triples (entity 1, entity 2, and relation) from sentences according to context semantics, which is one of the important subtasks of relation extraction [2]. It is widely used in text summarization, knowledge graph, and search engine. Relation extraction can be divided into two types according to the set of relation extraction results: restricted relation extraction, which requires the set of relation extraction to be determined in advance, and which category the current relation between entities belongs to according to entity and context semantics, similar to text classi cation; open relation extraction, there is no predetermined relation set and the expected domain of extraction is uncertain.
In this paper, a joint entity relation extraction model of capsule network based on bidirectional gated cyclic unit generation is constructed. Experiments show that the macro-average F1 value based on BLSTM model is higher than that based on BGRU model, and the running time based on BGRU model is shorter than that based on BLSTM model. Considering the imperfect semantic expression of word vector and the location information contained in the capsule network, two part-of-speech weighted models based on self_attention were constructed by adding part-of-speech features to enrich the text information, and the validity of the later combined model was verified by experiments. Aiming at the problem of weight distribution of word feature and part-of-speech feature, artificial fish swarm algorithm is proposed to optimize the weight of the two features by iterative optimization to improve the classification effect. e effectiveness of the two optimization schemes is proved by experiments.

Entity Relation Extraction.
Deep learning supervised entity-based relation mainly includes pipe lined extraction and joint extraction [2]. Pipe lining extraction refers to the realization of named entity recognition, NER, and then judging the relation between the entities, the benefit of this way is very convenient to handle and flexible in combination, but it ignores the association between the two subtasks and produces false superposition [3]. Joint extraction builds entity recognition and relation extraction within a model, which improves the extraction effect by enhancing the dependence of the two subtasks by sharing the encoding layer. Miwa and Bansal [4] used neural networks for joint relation extraction to improve the accuracy of the model,which is the first joint extraction model of neural networks. e joint extraction model designed by Li et al. [5] contains three layers: word embedding, sequence, and dependency layers.
e proposedmodel obtain good results on the semeval-2008 dataset. Zheng et al. [6] uses Bi-LSTM to extract text semantic information at the encoding layer as a shared layer for entity recognition and relation extraction, and enter data from the coding layer into the entity recognition model and relation extraction model. e authors experiment the model on the public dataset ACE05 to the highest level and the effectiveness of the joint extraction model is proved. Zhang et al. [7] combine convolutional neural networks with support vector machines and conditional random airports. A joint neural network model is constructed and parameter sharing way in the drug manual corpus has achieved good results. Ma et al. [8] put forward the entity and relation joint extraction model of the feedback mechanism. A feedback mechanism is introduced to modify the feature extraction layer and entity recognition model parameters according to the weighted loss and feedback of the two loss values.

Capsule Network.
Capsule network is a variant of Convolutional Neural Networks, which was first proposed by Geoffrey Hinton et al. e capsule network uses the capsule instead of the neuron node of the traditional neural network. e capsule is a neuron vector and each value of the capsule is a vector, which contains the feature, direction, state, and other information of the entity. e relation between words is trained by dynamic routing.
en the feature vectors are clustered to capture the text space features, including input layer, convolution layer, main capsule layer, category capsule layer, and output layer. Sabour et al. [9] built the capsule network architecture surpassed CNN on the MNIST data set to achieve the most advanced performance at that time. Wang et al. [10] and others designed a hybrid depth network model based on capsule network and cyclic neural network to verify the effectiveness of the model compared with the popular text classification methods at present. e author shows that the location information of the text and the relation between the part and the whole can be learned through the capsule network, which can enrich semantics and reduce the loss of feature information. e integrated model based on transformer-capsule proposed by Tang et al. [11] proves that the capsule network can effectively extract the phrase features in the text.

Entity Relation Extraction Based on Capsule Network.
is section constructs a parameter-shared joint entity relation extraction model based on capsule network. As shown in Figure 1, the model is mainly composed of four parts: vector representation layer, feature learning layer, entity recognition layer, and relation classification layer.
(1) Vector representation layer. For the problem that the computer cannot deal with the text directly, the word is embedded into the vector to represent the text feature, and word2vec [12] is used to train the domain word vector to enhance the semantic accuracy. (2) Feature learning layer. Word vectors can only represent the general meaning of the text. BGRU [13] is used to learn higher level contextual semantic features from the word vector representation of sentences. (3) Weight adjustment layer. For the learning text, feature weights are unreasonable and cannot make efficient use of resources, and the self-attention mechanism [14] is used to calculate weights as the weight adjustment layer. Vector representation layer, feature learning layer, and weight adjustment layer are used as the shared part of the model. (4) Entity recognition layer. Conditional random field is used to label text sequentially. Conditional random field is a typical discriminant undirected probability graph model, which was proposed by Lafferty [15] in 2001, which combines the characteristics of hidden Markov model (Hidden Markov Model, HMM) [16] and maximum entropy model (Maximum Entropy Markov Model, MEMM) [17], but the conditional random field does not have two unreasonable hypotheses of hidden Markov model (homogeneous Markov hypothesis and observation independence hypothesis). is makes the algorithm more flexible in design with more contextual information. Compared with the maximum entropy model, the conditional random field calculates the global optimal node probability, not just the local normalization, so the problem of label bias is solved.
McCallum [18] took the lead in using it in named entity recognition, and after continuous improvement, it has become the most successful method in named entity recognition [19]. According to the BIOS tagging method, the text is marked with a corresponding tag, in which the words connected by B and I are combined as an entity, and there is only one B tag as the beginning of the entity, and a plurality of I tags can be connected continuously in the middle; the O tag identifies that the recognition result of the current word does not belong to any entity and does not be processed; the S tag can be used as a separate entity, and when there are continuous S tags, it is recognized as a plurality of entities.
(5) Relational classification layer. e capsule network uses the capsule instead of the neuron node of the traditional neural network. e capsule is a neuron vector and each value of the capsule is a vector, which contains the feature, direction, state and other information of the entity. e relation between words is trained by dynamic routing, and then the feature vectors are clustered to capture the text space features.
e first layer is the input layer, in which the weightadjusted feature coding layer and the recognized entity vector are spliced as the input of the capsule network ABSR � [absr 1 , . . . , absr i , . . . , absr L ].
e second layer is the convolution layer, which extracts sentence features through standard N-gram convolution, in which is the size of the convolution kernel determines the quality of feature extraction. e input ABSR of the capsule network layer is L * V dimensional data, where L is the data length after the stitching of sentence features and entity vectors and V is the embedded word vector dimension. Set the filter with a window size of K and convolution sentences with a step size of 1, as shown in the following formula: Among them, M a ∈ M K * V represents filter for convolution operation, f represents the activation function ReLU, b 0 represents an offset item, z i: i+(K− 1) represents the data in the range of time i window from sentence i to i + K − 1, you will eventually get a feature sequence with e third floor is PrimaryCaps, the data extracted by the convolution layer is still a scalar representation. e main capsule layer converts the features extracted by the N-gram convolution layer into capsules and retains the instantiated parameters of the features in the form of vectors. In the shared parameter mode, each N-gram vector is multiplied by the shared parameters, and then transformed into vector neurons for dynamic routing. e fourth floor is ClassCaps, the dynamic routing algorithm is used to complete the transformation from the main capsule layer to the class capsule layer. e dynamic band routing algorithm ensures that the information of the child capsule is sent to the corresponding parent capsule in a nonlinear mapping way. rough several iterations to enhance or weaken the connection strength between the child capsule and the parent capsule, dynamic routing is more effective than the maximum pooling of CNN to lose a lot of text space information [20]. In the category capsule layer, the module operation is done for each vector, and the vector with the maximum modulus indicates that the more likely the analogy is.
Dynamic routing is the main part of the algorithm and the computing model is shown in Figure 2.
In Figure 2, b ij represents the j weight of the i th cycle, the initial default is 0, v1 and v2 are the lower eigenvectors output from the upper capsule layer, W is the transformation matrix corresponding to the vector, r is the number of iterations, and ci indicates the similarity between the upper capsule and the lower capsule. By updating each iteration, squeeze the vector using the Squash function as the activation function, and the calculation formula is as shown in formula (2). e larger the module of the vector is, the stronger the feature is. After the end of the calculation, the b value is updated to start the next round of calculation until the iteration r stops.
e fifth floor is the output layer. e output layer uses flattening function plus full-link layer plus softmax function. e output of the capsule network is cap_num * and cap_ dim dimensional data, where cap_num represents the number of capsules and cap_dim represents the capsule dimension. You need to flatten the output value of the capsule network into one-dimensional data and transfer it to the full-connection layer. e full-connection layer plays the role of classifier in the neural network, which belongs to the feedforward neural network. e input of the full-connection layer is all the output from the upper layer, and the output is n output nodes, where n represents the number of categories. e output of the full-connection layer is input into the softmax function to calculate the probability, and the final classification result is obtained.

Entity Relation Extraction Based on Part-of-Speech
Weighting. Part of speech can not only improve the accuracy of entity recognition but also promote the problem of text classification. Considering that the capsule network contains location information, this chapter proposes to add part of speech to reduce the ambiguity of words and enrich the feature semantic information. In addition, considering that part-of-speech features play different roles in different data, part of speech and other features are mostly added to the model of text data processing to deal with part of speech and other features in the same proportion. However, from the point of view of the different effects of part of speech on different experiments, the importance of part of speech plays a different role in different data and different models, so a joint entity relation extraction model based on artificial fish swarm part-of-speech weighting is constructed. e artificial fish swarm algorithm is used to optimize the weight of word features and part-of-speech features by iterative optimization,so as to to improve the classification effect [21]. e division of part of speech is based on grammatical norms and the meaning of words. Learning the context features of part of speech through BGRU can increase the grammatical information of sentence vectors. Considering that parts of speech have different importance to features, self-attention is used to add weight to each part of speech. As shown in Figure 3, the model consists of six parts: embedding layer, weight adjustment layer 1, feature learning layer, weight adjustment layer 2, feature binding layer, and relation extraction layer.
Embedded layer. According to the word vector dictionary and the part-of-speech vector dictionary, the word and part of speech are transformed into vector data by word embedding technology, S � [w 1 , . . . , w i , . . . , w n ] and P � [p 1 , . . . , p i , . . . , p n ], the weight of the artificial fish school Q � q directly as input data.
Weight adjustment layer 1. In the weight adjustment layer 1, the part-of-speech vector is weighted according to the weight of the artificial fish school, and the weight of the artificial fish school is multiplied by the part-of-speech vector, so as to adjust the weight of the part-of-speech vector and the part-of-speech vector, QP � [qp 1 , . . . , qp i , . . . , qp n ].  Mobile Information Systems model is random, and different results can be obtained even if the same data is used, so the parameters of each layer in the experiment are fixed by saving the model, so that the experimental results of the model under the same data are the same, and that the deep learning can be applied to the artificial fish swarm algorithm. e flowchart of part-of-speech weight optimization based on artificial fish school is as follows: Step 1: Artificial fish swarm algorithm needs to adjust the position of artificial fish according to the objective function, so as to get the optimal solution. However, the randomness of the parameters in the deep learning model leads to the randomness of the experimental results, so the experimental results are not accurate, and the experiment itself will fluctuate under the same configuration parameters and the same experimental data, which may cause artificial fish to fluctuate back and forth, and interfere with the optimization of artificial fish. In order to reduce the influence of randomness on the experiment, the part-of-speech weight is set to 1 into the capsule network entity relation extraction model combined with part-of-speech weighting. After 50 iterations, the trained model is saved to the parameters in the fixed depth model.
Step 2: Initialize the population size, visual field, step size, crowding factor, repetition times, and the initial position of each artificial fish.
Step 3: rough the model saved in step 1, the macroaverage F1 value of each artificial fish is calculated as the objective function value of the artificial fish, and the optimal value is given to the bulletin board.
Step 4: According to the objective function value of each individual, the individual artificial fish are carried out foraging, clustering, rear-end collision, and other behaviors, by comparing with the bulletin board to choose whether to update the bulletin board.
Step 5: Judge whether the condition for the end is met, then jump out of the cycle, and the value of the bulletin board is the final result; if not, you will turn to step 3. e better value obtained by the artificial fish swarm algorithm is used as the weight of part of speech, and the weighted part-of-speech vector is obtained by multiplying the weight value with the part-of-speech vector. e weighted part-of-speech vector and the part-of-speech vector spliced after the weighted attention mechanism are taken as the joint sharing part of the capsule network entity relation extraction model with part-of-speech weighting.

Entity Relation Extraction Based on Capsule Network.
In order to verify the effectiveness of the capsule network based on self-attention mechanism in joint entity relation extraction model, take the macro-average F1 value as the measurement standard. Because the initial values of many parameters are randomly generated during deep learning, most of the experiments are random, so the experimental results are different even in the same model with the same experimental data, in order to reduce the interference of experimental randomness to the experiment. Each experiment is run three times and the average value of three experiments is taken as the final macro-average F1 experimental results. e model proposed in this chapter is compared from two aspects of model comparison and affecting model parameters by using single variable method to verify the effectiveness of the proposed method.  CNN), and the joint extraction model of capsule network with self-attention mechanism (represented by Capsnet + self_attention) were tested with the same data. First of all, Cpasnet and CNN are iterated 300 times, and it is found that the macro-average F1 value of the two model experiments fluctuates in a small range after 100 iterations, and the experimental results are shown in Figure 4, so the number of iterations is set to 100 in the future experiment.
As can be seen from Figure 4, the CNN model is small in the case where the number of iterations is small, but there is a crossover at the 9th iteration. e average value of the experimental results of the two models is from 100 to 300 iterations, respectively, the average of the CAPSNET model is approximately 0.78293 (retaining five effective numbers), the average of the CNN model has an average F1 value of 0.77489. Keeping the five effective numbers, the CAPSNET model is about 0.00804 in the CNN model.
As can be seen from Figure 5, the difference between the CAPSNET model and the CAPSNET model after the addition of the self-focus mechanism is not very large, but the model from the 20th iteration begins to add self-focus, which is slightly higher or about equal to the CAPSNET model. Occasionally, the average value of the average F1 value is from 40th to 100 iterations, using the CAPSNET as 0.77452, the model of the self-focus mechanism is 0.78043. It is 0.00591, which is 0.0051, proves that the self-focus mechanism improves the average F1 value of the model to a certain extent, which proves the effectiveness of the incoming and introduction of the capsule network entity relation of the introduction of self-focus mechanism.

Impact Model Parameters.
In the depth learning model, suitable parameters can not only improve the operating rate of the model but also improve the learning efficiency of the model. e effects of capsule dimensions and convolutionary window values in the capsule network were studied using a single variable.

Capsule Network Based on Self-Attention Part-of-Speech
Weighting.
e model is analysed from two aspects: macroaverage F1 value and experimental time. e details are as follows.

Comparison of Macro-Average F1 Values of Four
Models. Comparing the precombination model and postcombination model proposed by combining the features of part of speech in this section with the third chapter model (expressed by BLSTM) and the model proposed in the first Section 3 (expressed by BGRU), 50 iterations will be realized, and the experimental results are shown in Figure 8. e macro-average F1 value of the four models from the 40th iteration to the 50th iteration was calculated, respectively, and the order of the macro-average F1 value of the four models from big to small was: late combination model 0.78394, BLSTM model 0.77792, early combination model 0.77632, and BGRU model 0.7733. It can be seen that the combination model in the early stage is 0.00302 higher than BGRU model, but it is lower than BLSTM alone. e combination model in the later stage has a great influence on the model, which is 0.01064 higher than BGRU alone and 0.00602 higher than BLSTM alone. It verifies the effectiveness of the joint entity relation extraction model in capsule network based on self_attention part-of-speech weighting.

Comparison of Running Time of Four Models.
e four models are iterated 30 times, 50 times, and 100 times, respectively, to get the running time. e experimental results are shown in Figure 9. As far as the three experimental results are concerned, the running time of the preintegration model and the postintegration model is not much different, which is longer than that of BGRU alone, but the running time of the preintegration model and the postintegration model is slightly shorter than that of BLSTM alone.
By replacing BLSTM with BGRU and taking part of speech as an extended feature, the macro-average F1 value of the joint entity relation extraction model based on capsule network can be improved on the premise of reducing the running time, thus proving the effectiveness of the joint entity relation extraction model based on self_attention part-of-speech weighting. Mobile Information Systems 7 effect of the model. e model is iterated 50 times to save the experimental model and run three times randomly to save three models: model 1, model 2, model 3; artificial fish experiments are carried out with the same three groups of data for each model; data 1, data 2, and data 3. A total of 9 experimental results were obtained in 3 groups, each of which corresponds to the optimal value of a model and a set of data, and the average value of the 9 experimental results is 0.84293. e experimental results of the capsule network joint entity relation extraction model without adding part of speech are shown in Figure 10.

Capsule Network Based on
e weight of the part-ofspeech feature is is 0.8 (the approximate value is obtained by rounding) by the artificial fish swarm experiment, which is usedto compare following models, the capsule network joint entity relation extraction model based on the artificial fish school part-of-speech weighting (represented by artificial fish school + self-attention mechanism), a POS-weighted capsule network joint entity relation extraction model based on self_attention POS weighting (represented by self-attention mechanism), and the Pipeline model (represented by Pipeline) [22]. e final experimental results are shown in Figure 10. Each experiment was performed three times average, and the average F1 value of the Pipeline model was 0.77043, the unrelated capsule network entity relation combined withdrawal model macro-average F1 value is 0.7733, alone uses a self-focused mechanism's postcombination model of the model. e average F1 value was 0.78394 and the average F1 value of the artificial fish group + self-focus mechanism model reached is 0.78565. It can be seen that the average F1 value of the Pipeline model ratio of the capsule network entity is increased by 0.00287; the model after joining the morphism is increased by 0.01064 than the capsule network entity relation, and the combination of artificial fish group is used in combination. e solid relational combined extraction model is 0.00171, and the macro-average F1 value of the combination of the combination of the combination of self-focusing mechanisms is 0.00171, which proves the effectiveness of the model. Artificial fish swarm + self attention mechanism

Conclusion
On the basis of the capsule network joint entity relation extraction model, the semantic information is improved by adding part-of-speech features and an entity relation joint extraction model based on self_attention is constructed to weigh the part of speech in the sentence internally. e experimental results show that the macroaverage F1 value of the model is improved. In order to solve the problem of the weight distribution of word features and part-of-speech features, an artificial fish swarm algorithm is proposed to optimize the two feature weights by iterative optimization, and the classification effect is improved by adjusting the weight of part of speech to control the proportion of word vectors and part-of-speech vectors. To sum up, the part of speech can improve the features of part of speech to a certain extent and the importance of part of speech is different from that of words. In the future, semantic feature representation will be improved by enriching semantic feature, perfecting model structure and optimizing function method.
Data Availability e datasets used and/or analyzed during the current study are available from the corresponding author upon reasonable request.