Research on Domain-Specific Knowledge Graph Based on the RoBERTa-wwm-ext Pretraining Model

The purpose of this study is to solve the effective way of domain-specific knowledge graph construction from information to knowledge. We propose the deep learning algorithm to extract entities and relationship from open-source intelligence by the RoBERTa-wwm-ext pretraining model and a knowledge fusion framework based on the longest common attribute entity alignment technology and bring in different text similarity algorithms and classification algorithms for verification. The experimental research showed that the named entity recognition model using the RoBERTa-wwm-ext pretrained model achieves the best results in terms of recall rate and F1 value, first, and the F value of RoBERTa-wwm-ext + BiLSTM + CRF reached up to 83.07%. Second, the RoBERTa-wwm-ext relationship extraction model has achieved the best results; compared with the relation extraction model based on recurrent neural network, it is improved by about 20%∼30%. Finally, the entity alignment algorithm based on the attribute similarity of the longest common subsequence has achieved the best results on the whole. The findings of this study provide an effective way to complete knowledge graph construction in domain-specific texts. The research serves as a first step for future research, for example, domain-specific intelligent Q&A.


Introduction
With the advent of the big data era and artificial intelligence, the knowledge graph in the domain-specific application has received widespread attention. Knowledge map is a new concept proposed by Google in 2012, which is essentially the knowledge base of semantic network. As an effective knowledge representation method of cognitive intelligence of the new generation of artificial intelligence, knowledge graph uses open-source information to train cognitive models and then applies this model for cognitive application, which is an effective formation of anthropomorphic thinking. It has not only become the most intuitive and understandable framework for knowledge representation and reasoning but also improved the efficient of knowledge discovery, such as the problem system based on the domainspecific knowledge graph. However, when the knowledge graph is successfully applied in many fields, ambiguity, difficult links, data sparsity in special fields, and other problems hinder the application task effect based on the domain-specific knowledge graph. Especially in some fields with complex data sources and limited available data, how to provide an accurate knowledge extraction model for new knowledge cognition according to the needs of the field and, at the same time, how to effectively link and fuse the discovered new knowledge with the knowledge base or how to effectively link and fuse the multisource heterogeneous data with the knowledge base is an important foundation for the application of knowledge maps in some fields at present. In view of this, since open-source intelligence such as military books and the Internet has been used as the data source to solve related problems such as the automatic construction of the domain-specific knowledge graph, this research serves as a first step for future research, for example, the construction of knowledge graphs Q&A application. e innovative contribution of this research is as follows: (1) e RoBERTa-wwm-ext model is used to enhance the knowledge of the data in the knowledge extraction process to complete the knowledge extraction including entity and relationship (2) is study proposes a knowledge fusion framework based on the longest common attribute entity alignment technology and brings in different text similarity algorithms and classification algorithms for verification, all of which have achieved good results

Related Work
e knowledge graph was proposed by Google in 2012, and its essence is a semantic network knowledge base. e knowledge graph is divided into a general knowledge graph and a domain-specific knowledge graph. For the former, it emphasizes the scope of knowledge, while for the latter, it emphasizes the accuracy of domain-specific knowledge. At present, the construction and exploration technology of knowledge graphs at home and abroad has become mature. Many research institutions have successively created some large-scale general knowledge graphs, such as YAGO [1], Wikidata [2], and DBpedia [3]. However, for knowledge graphs in professional fields, the services it provides for people are still far from meeting the needs. For this reason, academia and industry have paid more attention to the research of domain-specific knowledge graphs with high professional knowledge accuracy.
Due to the sensitivity of data and low resource information in the field of military security, knowledge graph research in this field is relatively lacking, and the research progress is slow. F. Liao et al. [4], using the knowledge extraction method based on the BiLSTM model, studied the knowledge graph construction of the US military equipment and designed and implemented the US military equipment knowledge graph system. C. Liu et al. [5], aiming at the problems of loose structure of the military equipment database, difficulty in using effectively, low storage efficiency, and chaotic management, used the entity relation extraction method of the dependency syntax tree and CRF to extract information from unstructured text data, build a knowledge map of the military equipment, and achieve good results. D. Song et al. [6] proposed a military knowledge graph construction method based on bibliographic data of military disciplines to solve the problems of sparse data distribution, weak data correlation, and difficulty in using data effectively in the retrieval process of military industrial academic research content.
is method designs the structure of the knowledge graph according to the bibliographic information characteristics of the article and uses the information-rich article titles and keywords to study entity extraction, entity classification, knowledge graph storage, and visualization. Yao Yi et al. [7] relied on the structured data of the existing arsenal to build the equipment concept map, using the method of iterative learning, based on open multisource data to complete the equipment entity to ensure the breadth and depth of the concept map precision. Xing Meng et al. [8] proposed a technical framework for the construction and application of knowledge graphs in the military domain regarding the opportunities and challenges currently faced by knowledge graphs in the military domain. It also discusses the key issues and core technologies in all aspects of the knowledge life cycle, such as knowledge representation based on the ontology system, knowledge extraction based on machine learning, cross-domain-specific knowledge fusion, knowledge computing, and knowledge application. Zhao Yu et al. [9] used the BiLSTM + CRF model for military named entity recognition, extracted relation words through syntactic analysis and hierarchical Dirichlet process (HDP) clustering, and obtained entities and the relationships between entities to build a military knowledge graph.
Knowledge fusion is an important link in the process of domain-specific knowledge graph construction and an important means to ensure knowledge quality and knowledge update. At present, in the process of knowledge graph research in the military field, the technology for knowledge fusion is relatively lacking. In the general field, knowledge fusion is usually based on cross-language entity alignment and same-language knowledge base entity alignment technology research. Cross-language entity alignment techniques mainly include representation learning methods based on translation models and graph neural networks. e translation model takes TransE [10] as the basic model. Models such as JE [11], MTransE [12], IPTransE [13], and AttrE [14] have improved it by using methods such as attribute, relation information, and iterative training and obtained a good effect. e graph neural network uses the global structural information of the graph to match the entities to be aligned in the two knowledge graphs. e GCN-Align [15] model is an earlier model that uses a graph neural network algorithm for entity alignment. It proposes a cross-language knowledge graph alignment method based on a graph convolutional network. e research on entity alignment of the same language is based on the Chinese heterogeneous encyclopedia knowledge base and adopts the paired entity alignment technology with the text similarity algorithm as the core.
It can be seen from the above that most of the knowledge graphs in the military field focus on the construction of knowledge graphs of weapons and equipment. Some military knowledge graph research does not subdivide the ontology granularity and builds a military knowledge graph equivalent to an encyclopedia, which is similar to the military module in the general knowledge graph and cannot meet accurate knowledge services. At present, the only research on knowledge graph construction in the military field mainly focuses on the extraction of knowledge, and there are relatively few studies on knowledge fusion. Based on the research on the military intelligence information of major military news platforms and the suggestions of military experts, this study refines the data granularity of the knowledge graph in the military field and extracts opensource military information through the deep learning technology. Second, this study obtains the diverse and heterogeneous military domain knowledge base on the Internet, combines text similarity and machine learning algorithm, proposes a knowledge fusion algorithm framework based on the longest common attribute entity alignment technology, which is used for knowledge fusion and updating, and then, constructs a high-quality maintainable knowledge graph of the military.

Key Technology
e construction of the domain-specific knowledge graph is divided into the construction of the ontology layer and the data layer. In this article, by studying the influence of data distribution and data granularity in the open-source military intelligence on the quality of knowledge graph construction, the ontology of the graph is divided into 9 entity types, including countries, aircraft carriers, missiles, radars, ships, aircraft, military bases, waters, and islands, and the relationship is divided into 5 types: country, collaboration, activity, equipment, and NULL, which are used to verify the reliability of the knowledge graph construction experiment. e definition of the relationship between entities is shown in Table 1.
e technologies of domain-specific knowledge graph data layer construction mainly include domain-specific data acquisition, knowledge extraction, knowledge fusion, knowledge storage, and graph visualization. Among them, knowledge extraction and knowledge fusion technology are the most reused, which determine the efficiency and quality of graph construction. erefore, this study focuses on knowledge extraction and knowledge fusion technology in the construction of a knowledge graph. e technical architecture of the knowledge graph construction is shown in Figure 1.

Data Annotation.
Data annotation is an important part of data preprocessing, which determines the quality of knowledge extraction. e entity annotation adopts the BIO annotation method; that is, the "B-entity type" indicates the mark of the beginning word of the entity, the "I-entity type" indicates the mark of the subsequent word of the entity, and "O" indicates the nonentity part. According to the definition of the ontology library, this study annotates 9 types of military entities, including NAT (country), PLA (aircraft), VES (ship), MIS (missile), WAT (water area), AIR (aircraft carrier), BAS (military base), RAD (radar), and ISL (island). An example of entity annotation is shown in Figure 2.
Relation annotation is based on entity annotation. We segment the text data with entity annotations in units of sentences and judge and annotate the relationship between entities contained in each unit according to the definition of the relational ontology library. Examples of relational annotations are shown in Table 2.

Knowledge Extraction.
Knowledge extraction is an important step in the process of knowledge graph construction, which is mainly divided into two parts: namedentity recognition (entity extraction) and relation extraction. In this study, the method based on deep learning is used to extract knowledge from unstructured text data, and the related structured knowledge base is extracted according to the organizational structure design rules of the knowledge base. is article mainly studies knowledge extraction from unstructured text data and uses a pretraining model to vectorize text data to enhance knowledge. e acquired text semantic feature vector is combined with the deep learning model to learn the features of a large number of texts corpus and realize the knowledge extraction of unstructured text data.

RoBERTa-wwm-ext Pretraining Model.
e BERT [16] (Bidirectional Encoder Representation from Transformers) model is a pretraining model for encoding text characters proposed by Google in 2018, which uses the idea of transfer learning. Knowledge or patterns learned in one domain or task are applied to a different but related domain or problem. e BERT model mainly uses the encoder structure of the transformer model [17], which can better obtain the semantic relationship in the text data. e BERT model uses massive Wikipedia data for model training so that each word can obtain a better vector feature.
e RoBERTa [18] (robustly optimized BERT approach) model is an improved version of the BERT model, using more training data and improving the training method. In terms of training method, RoBERTa removes the NSP (Next Sentence Prediction) task, and at the same time, compared with BERT's static masking mechanism, RoBERTa uses a dynamic masking mechanism. Although RoBERTa has only slightly improved the model mechanism, it has achieved an effective improvement in the use effect.
WWM [19] (Whole Word Masking) is the whole word mask, which changes the method of using a single Chinese character as the unit of mask in the Chinese BERT model to mask the whole Chinese word.
e advantage of such training is that the words encoded by the model have the meaning of the context words.
e Harbin Institute of Technology iFLYTEK Joint Laboratory released the Chinese RoBERTa-wwm-ext Computational Intelligence and Neuroscience pretraining model, which achieved the best results in several natural language processing tasks. erefore, this study uses the RoBERTa-wwm-ext pretraining model open sourced by the Harbin Institute of Technology to obtain text vectors with semantic features and encode the text.

Named Entity Recognition Model.
e named entity recognition model in this study is mainly divided into three layers, namely, the text encoding layer of the pretraining model, the bidirectional long short-term memory network layer (BiLSTM), and the conditional random field layer (CRF). e structure of the model is shown in Figure 3. First, this model uses sentences containing military information as input data and encodes the text information of the sentences through the RoBERTa-wwm-ext model. A sequence of word vectors (X 1 , X 2 , · · · , X n ) with semantic information is obtained and fed into a bidirectional LSTM layer to automatically extract the features of sentences. e bidirectional LSTM layer splices the hidden state sequence (h 1 , h 2 , · · · , h n ) output by the forward LSTM and the sequence output by the reverse LSTM at the corresponding position according to the position, accesses the linear layer, and maps it into a K-dimensional hidden vector. Among them, K is the number of entity labels. At this time, K classification can be performed on the P vector corresponding to each character vector, and data labeling can be performed. To better obtain sentence-level annotation information, the output P vector is input into the CRF layer.

Relation Extraction Model.
e relation extraction model consists of a text encoding layer, a fully connected layer, and a multiclassifier layer. First, we obtain entity A and entity B whose relationship needs to be determined in the text data and the text data containing these two entities and connect the three to form the input text of the model. e text encoding layer encodes the input text data to form a hidden layer state vector with semantic features. e fully connected layer is used to normalize the obtained feature  vector containing semantics, and the classification result is input through the multiclassifier layer. It is expressed in a mathematical equation as follows: where H represents the semantic feature vector of the hidden layer obtained by the RoBERTa-wwm-ext model, h is the output of the fully connected layer, w and b are the weight and bias parameters, respectively, and p is the input probability of each relation label. e model structure is shown in Figure 4.

Knowledge Fusion.
e knowledge obtained through knowledge extraction has a wide range of sources, the quality of knowledge from different sources is different, and the expression form is not uniform, which will lead to problems such as knowledge redundancy and low accuracy. Integrating multisource and heterogeneous knowledge and solving problems such as knowledge duplication is the key to improving the quality of knowledge graph construction. First, knowledge fusion is to fuse entities with different expressions of the same entity concept from multiple data sources to obtain unified knowledge. Knowledge fusion is divided into two stages. In the first stage, knowledge fusion is performed on the extracted results to ensure the formation of high-quality knowledge graphs. In the second stage, based on the constructed knowledge graph, knowledge fusion is carried out with the external knowledge base to realize the expansion and update of knowledge.

Knowledge Fusion Framework.
e knowledge fusion architecture of this study is shown in Figure 5. e data come from two parts, namely, the result of knowledge extraction and the external knowledge base, corresponding to two stages of knowledge fusion. e data preprocessing process organizes the structured and semistructured data in the external knowledge base into unified knowledge triples, through manual ontology matching and entity classification and entity alignment within each ontology type, and finally completes knowledge fusion.

Entity Alignment.
e entity alignment algorithm is the core of the knowledge fusion framework.
is study adopts the paired entity alignment algorithm; that is, the similarity screening of entities is performed in advance, the entities that cannot be similar are filtered out, and the possibly similar entities are marked as candidate entity pairs.

Computational Intelligence and Neuroscience
According to the attribute similarity between candidate entity pairs, a machine learning algorithm is used to classify them into a matching set or a nonmatching set. e algorithm flow is shown in Figure 6. e attribute similarity algorithm is an important part of the paired entity alignment algorithm in this study. Due to the different sources of knowledge, the attribute lengths of entities describing knowledge are inconsistent. erefore, this study proposes a method of maximum common attribute length to calculate the similarity of attributes. We select the attribute of the maximum coincidence of two entities in the entity pair to calculate, which is expressed as follows: CommomPropertty e 1 , e 2 � Property e 1 ∩ Property e 2 . (2) Among them, e 1 and e 2 represent entities and Property e 1 and Property e 2 represent attributes corresponding to entities. e lengths of common attributes between different entity pairs are inconsistent, which leads to unequal lengths of attribute similarity feature vectors obtained from attribute similarity calculation. erefore, the obtained feature vector is subjected to dimension reduction and normalization processing. Since the name of the entity has a high degree of identification for the entity, the entity name is used as the main attribute Property main , and the other attributes are used as auxiliary attributes Property secondary � P s1 , P s2 , · · · , P s . e auxiliary attributes are averaged to reduce the original multidimensional feature vector to two dimensions. ereby, the similar feature quantities of the attributes can be unified. Normalizing the attribute similarity vector can represent the similarity between entity pairs. Based on the abovementioned ideas, the similarity between entity pairs can be expressed as follows: e attribute similarity algorithm adopts text similarity algorithms such as SimHash [20], edit distance, longest common subsequence, and BERT word embedding. e classification algorithm adopts machine learning algorithms such as support vector machine [21] (SVM), decision tree, logistic regression, and naive Bayes. Combined experiments with different attribute similarity algorithms and classification algorithms are used to verify the effectiveness of entity alignment.

Datasets.
is study takes the military intelligence information data as the main research object, and the data mainly come from open-source military news. We use crawler technology to collect unstructured text information containing the military intelligence information such as global military, military forums, and military modules in various news platforms for knowledge extraction. Second, we extract the structured and semistructured data in the open-source weapon and equipment knowledge graph, weapon encyclopedia, Wikipedia knowledge base, and the US military knowledge base for knowledge fusion. e experimental datasets in this study are divided into three parts: named entity recognition, relation extraction, and entity alignment.

Named Entity Recognition Dataset.
e named entity recognition part of the dataset in this study is constructed by manual annotation. e dataset contains 9 types of entity labels, including a total of 53,295 entity labels and 14 MB of text data. e distribution of entity labels is shown in Table 3.

Relation Extraction Dataset.
e relation extraction part of the dataset in this study is jointly constructed by the methods of automatic labeling of entity-relationship types and manual review. e dataset contains a total of 2251 pieces of data with 5 types of relationship labels. e distribution of relationship labels is shown in Table 4.
Since the length of the text corpus has a certain influence on the difficulty of relation extraction, the text length is counted. e text length distributions of the training set and test set are the same, as shown in Figure 7.

Entity Alignment Dataset.
e data used in this study to verify the entity alignment algorithm in the knowledge fusion framework come from two data platforms, the opensource military weapon and equipment knowledge graph and the weapons encyclopedia. From the acquired data, a total of 1324 prealigned entity pairs were obtained by random sampling and manual alignment of four types of data, including missile weapons, aircraft carriers, vessels, and plane. ese entity pairs contain a total of 28447 attribute triples. e specific data distribution is shown in Table 5.
rough the analysis of the dataset composed of two data sources, it is found that two entities have the same name in some prealigned entity pairs. In order to improve the robustness of the trained model, a new entity name is formed Longest common property ("Nimitz" aircra carrier, country of origin, United States) ("Nimitz" aircra carrier, captain, 332.9 meters) ("Nimitz" aircra carrier, width, 40.8 meters) ...
("Nimitz" class nuclear powered aircra carrier, "Queen Elizabeth" class aircra carrier) ... Computational Intelligence and Neuroscience by inserting, deleting, or replacing characters in the data from one of the sources according to the naming rules of the military field and forming a prealigned entity pair in this experiment. Negative samples are obtained by randomly replacing entities between prealigned entity pairs. e prealigned entities are scrambled by their data categories to form 50% positive and 50% negative samples, and finally, 2648 entity pairs are formed for alignment experiments.

Named Entity Recognition Model Parameters.
e amount of data used in the training of the named entity recognition model and the parameter volume of the pretraining model are large. Considering the hardware environment of the experiment and the reliability of model training, the main parameters of the named body recognition model in this study are shown in Table 6.

Relation Extraction Model Parameters.
e size of the hidden layer state vector of relation extraction and the size of the number of hidden layers refer to the RoBERTa model parameters. Considering the distribution of text lengths in the dataset, we set the maximum text input (max_length) to 128 characters. Other parameters are based on the experience of deep learning model training, and the specific experimental parameter settings are shown in Table 7.

Evaluation Metrics.
is experiment uses three indicators of precision, recall, and F1 value to evaluate the reliability of the model. e calculation expressions of the three indicators are as follows: where TP represents the number of labels that the model can correctly detect, FP represents the number of irrelevant labels detected by the model, and FN represents the number of labels not detected by the model.

Named Entity Recognition Experiment
Results. e named entity recognition dataset is used for data, the size of the training dataset is 10.3 MB, and the test is 3.2 MB. e experimental results of the trained model on the test set are shown in Table 8. e accuracy of some of the entity labels is low, the reason is that the amount of labeled data for this part of the entity labels is small, and the number of occurrences in the military dataset used in this experiment is small. However, it does not affect the reliability of the model and can be corrected by increasing the number of corresponding labels.
To verify the effectiveness of the named entity recognition model and dataset, four groups of comparative experiments were set up in this experiment as shown in Table 9. ese include the BiLSTM + CRF model, IDCCN model, BERT + BiLSTM + CRF model, and BERT-wmmext + BiLSTM + CRF model. From the results of the comparative experiments, it can be seen that the named entity recognition model using the RoBERTa-wwm-ext pretrained model achieves the best results in terms of recall rate and F1 value.

Relation Extraction Experiment Results.
e relation extraction experiment uses 1800 text data for model training and 451 text data for testing. Each relationship label has achieved better results on the test set, and the experimental results of the trained model on the test set are shown in Table 10.
To verify the effectiveness of the relation extraction model and dataset, four groups of comparative experiments were set up in this experiment, and the more popular deep   Figure 7: Distribution of text length in relation extraction. 8 Computational Intelligence and Neuroscience learning models were selected as comparative experiments. e comparative experimental results are shown in Table 11. From the comparative experimental results, it can be seen that the BERT series of relationship extraction models have good results, and the BERT-wwm-ext relationship extraction model has achieved the best results, which is similar to the RoBERTa-wwm-ext-based relationship extraction model used in this study. Compared with the relation extraction model based on recurrent neural network, it is improved by about 20%∼30%.
After further analysis, there are two reasons for the excellent effect of the experiment. On the one hand, it is the superiority of the BERT model, which is formed based on massive data training and has a good ability to express knowledge.
e RoBERTa-wwm-ext model is a further improvement of the BERT model, so the experiment has achieved good results. On the other hand, the dataset used for the experiment contains fewer relationship types, which may lead to a high effect on the experiment. In the future, the relationship type of the dataset will be increased to further verify the effect of the model.

Entity Alignment Experiment Results.
e entity alignment experiment in this study is based on the scikitlearn and bert4keras framework. Combined experimental analysis of four attribute similarity algorithms of SimHash, edit distance, longest common subsequence (LCS), and BERT word embedding and four classification algorithms of support vector machine (SVM), decision tree, logistic regression (LR), and naive Bayes (NB) was conducted. e experimental results are shown in Table 12.
According to Table 12, the entity alignment algorithm based on the attribute similarity of the longest common subsequence has achieved the best results on the whole, and the combination with the classification algorithm of logistic regression has achieved the best results. Second, the algorithm based on SimHash is better overall and achieves the

Discussion
is study aims to provide an efficient and reliable method for the construction of knowledge graphs in specific fields. Taking the military field as an example, the two modules of knowledge extraction and knowledge fusion mentioned in the study are experimentally verified. Different from previous studies, this study uses the pretrained model RoB-ERTa-wwm-ext for knowledge enhancement, which improves the effect of knowledge extraction. In addition, this article studies the knowledge fusion module that is less studied in the construction of domain knowledge graphs in the past and proposes a knowledge fusion framework based on the longest common attribute entity alignment technology. In the process of graph construction, it is used to ensure its quality, and it is used for knowledge expansion and update in the process of graph maintenance.
In the knowledge extraction module, this study uses the RoBERTa-wwm-ext model to vectorize text data, which is different from the previous random initialization or word2vec for text vectorization. e advantage of this is that the vector of the knowledge extraction model in the data input stage has excellent semantic expression ability, so the effect of the trained knowledge extraction model will be greatly improved. Most of the previous research on domain knowledge graph construction focused on knowledge extraction, but this research not only studies knowledge extraction but also studies knowledge fusion and proposes a knowledge fusion framework. e framework brings in different text similarity algorithms and machine learning classification algorithms, all of which have achieved good results.
ere are also some limitations in this study. First, this study uses the manual annotation method in the data preprocessing stage to annotate the named body recognition dataset and relation extraction dataset, which increases a lot of labor costs. Although the most effective method is manual annotation at present, semiautomatic annotation methods will be considered in the future, for example, annotating a small number of correct seed data and using the seed data to iteratively annotate most of the data.
Second, in the knowledge extraction stage of this study, the recognition effect of some entity labels in the named entity recognition experiment is not good. After analyzing the experimental process, in the named entity recognition experiment, due to the small number of annotations for some labels, the data distribution of each label in the data samples used for training is unbalanced, resulting in poor recognition of some labels. In the future, we will continue to optimize the datasets of each experimental module to improve the experimental effect of each module.
ird, in the knowledge fusion stage, a large amount of text similarity between entity attributes needs to be calculated in the process of entity alignment, which will inevitably lead to the low computational efficiency of the knowledge fusion framework. In the future, the domain synonym dictionary and stop word dictionary will be added to the text similarity calculation process, which can reduce the number of text similarity calculations and optimize the efficiency of the knowledge fusion framework.

Conclusion
is study combines the machine learning and deep learning technology to study the automatic construction technology of knowledge graphs in the military domain and focuses on the knowledge extraction and knowledge fusion technology in the process of domain-specific graph construction. e RoBERTa-wwm-ext model is used to enhance the knowledge of the data in the knowledge extraction process to complete the knowledge extraction. Second, this study proposes a knowledge fusion framework based on the longest common attribute entity alignment technology and brings in different text similarity algorithms and classification algorithms for verification, all of which have achieved good results. e recognition effect of some labels in the named entity recognition model in this study is not good. In the future, the model training dataset and the architecture of the model will be improved to improve the effect of entity extraction. In knowledge fusion, a large amount of data calculation needs to be performed. e current knowledge fusion framework  has low computational efficiency. In the future, a domain synonym dictionary and a stop word dictionary will be added to perform word segmentation calculation on attribute information to improve computational efficiency.
Data Availability e data supporting the findings of this study are available within the article.

Conflicts of Interest
e authors declare that they have no conflicts of interest.