Improvement and Analysis of Semantic Similarity Algorithm Based on Linguistic Concept Structure

. With the rapid development of information age, various social groups and corresponding institutions are producing a large amount of information data every day. For such huge data storage and identiﬁcation, in order to manage such data more eﬃciently and reasonably, traditional semantic similarity algorithm emerges. However, the accuracy of the traditional semantic similarity algorithm is relatively low, and the convergence of corresponding algorithm is poor. Based on this problem, this paper starts with the conceptual structure of language, analyzes the depth of language structure and the distance between nodes, and analyzes the two levels as the starting point. For the information of a speciﬁc data resource description frame type, the weight of interconnected edges is used for impact analysis so as to realize the semantic similarity impact analysis of all information data. Based on the above improvements, this paper also systematically establishes the data information modeling process based on language conceptual structure and establishes the corresponding model. In the experimental part, the improved algorithm is simulated and analyzed. The simulation results show that compared with the traditional algorithm, the algorithm has obvious accuracy improvement.


Introduction
e rapid development of information age makes the world not to produce a large amount of data at all times. At the same time, text as an important carrier of modern information data, it is very important and meaningful to deal with and analyze it efficiently and reasonably. In order to further improve the data processing rate and processing accuracy, text similarity processing technology emerges at the historic moment, and its corresponding semantic similarity processing algorithm is more necessary and meaningful [1,2]. e traditional semantic similarity algorithm is mainly used in intelligent data retrieval, automatic problem system of related data information, and corresponding text data retrieval system. e corresponding semantic similarity algorithm reflects different processing and analysis difficulties according to different languages [3,4]. Generally speaking, the Chinese semantic similarity processing algorithm is more difficult than the English semantic similarity processing algorithm. e traditional semantic similarity algorithm has no high intelligence when dealing with the current Chinese and English related data. It mainly searches according to the keywords. At the same time, the corresponding machine is essentially free of any data when the data are large or the Chinese and English inclusion is mixed. e method can effectively understand the semantic relationship between the corresponding mass data, which affects the accuracy of the corresponding search and processing. erefore, based on the above analysis and the improvement of the traditional semantic similarity algorithm, different data structures and data systems can be effectively interconnected so that the corresponding machine can fully understand the semantic association of data and finally realize that the effective processing of data becomes important and meaningful.
Based on the current data information processing-related issues, the most core processing consensus is semantic similarity processing. In view of this data processing technology, the popular data management has received corresponding attention according to technology. In essence, it is still the improvement of the traditional semantic processing algorithm. It unifies the relevant identifiers in semantic network, standardizes the corresponding data information format with the corresponding data resource description framework, and finally realizes the interconnection of various data. In the process of data management, the data management system is still improved in the traditional semantic processing algorithm to some extent and the association of isolated data is realized, thus forming the whole data network and finally forming the global data space. Based on such an organic whole, the search and discovery of semantic information can be realized quickly [5][6][7][8]. e above is the core processing algorithm of the current semantic similarity processing algorithm, and its corresponding shortcomings are obvious, but to some extent, it represents the development trend of data information processing. Other algorithms should also be based on this algorithm for subsequent improvement and optimization. e corresponding research on semantic similarity algorithm improvement will be introduced in the second section of this paper analysis [9]. In order to solve the problems of the traditional semantic similarity algorithm, this paper studies the high-dimensional sparsity and accuracy. Based on this, this paper analyzes the depth of language structure and node distance from the perspective of linguistic concept structure and describes the information of framework type by using interconnection edge weights. From this perspective, the semantic similarity analysis of the entire information data is realized [10,11]. Based on the above improvements, this paper also systematically establishes the modeling process of data information based on linguistic conceptual structure and establishes the corresponding model. In the experimental part, the improved algorithm is simulated and analyzed. e simulation results show that the proposed algorithm has a significant improvement in similarity judgment accuracy compared with the the traditional algorithm, and the problem of high-dimensional data sparsity has been further solved.
Based on the above analysis, the following arrangements will be made in the chapter: the second section of this paper will analyze the current research status of the semantic similarity algorithm and point out the advantages and disadvantages of various algorithms; the third section will trigger the semantic similarity from the depth of language structure and corresponding node distance based on the conceptual theory of language structure, and at the same time, it will focus on the linguistic overview. e modeling process of data information under read structure is standardized systematically; the fourth section of this paper will simulate and analyze the data of the improved algorithm; finally, the paper is summarized.

Related Work Analysis: Analysis of the Current Research Status of Semantic Similarity Algorithm
Aiming at the problems of search accuracy and convergence of the semantic similarity algorithm in data information processing, a large number of researchers and related entities have analyzed and studied it and put forward different optimization algorithms. In the literature on search accuracy, the relevant researchers in the United States proposed the corresponding vector space model implicit semantic search algorithm, which mainly represented the corresponding information data as the vector of the corresponding feature weight component. It simplified the complex relationship between the text and the corresponding keywords and realized the text represented by a simple vector. In practical application, the model uses weight to reflect the importance of keywords, but this semantic similarity processing algorithm has a large amount of calculation and has no advantage in the case of large amount of data [12,13]; European scholars proposed to extend the existing semantic similarity algorithm, focusing on the semantic similarity algorithm between the corresponding sentences. In fact, it uses two-level dynamic programming technology to calculate the similarity of information data. It uses the dynamic programming algorithm to calculate the similarity between two sentences of different length, but the algorithm needs to balance the similarity between phrases in real time [14,15]; relevant scholars have proposed machine learning or similar clustering algorithm. Advanced algorithm deconstructs the ontology model corresponding to data information text, but the improved algorithm under this advanced algorithm needs to analyze the whole huge data information database in advance and then build an ontology model of database. At this time, the quality of the corresponding database ontology model directly affects the good degree of the whole algorithm. In order to further solve the above research problems, the algorithm has been widely used in practice. Asian researchers summarized the shortcomings of the above research and proposed a similarity calculation algorithm based on text ontology. e corresponding algorithms mainly include the construction of similarity structure, accurate extraction of semantic content of data information, and similarity calculation. e algorithm has a certain application value, which solves the problem of the traditional semantic similarity algorithm to a certain extent, but there is still the problem of algorithm loss when the amount of data information is large [16,17]; based on the structure of text information, relevant researchers propose a corresponding distance algorithm. is algorithm first calculates the corresponding distance length between the corresponding texts and identifies the corresponding ontology model between the two nodes with the farther distance, the smaller the similarity. e algorithm relies on a complete semantic dictionary and the corresponding hierarchical structure, but the algorithm model covers a small range of data text information and lacks advantages in the corresponding subdivision data text processing field [18,19]. Based on the research of the distance algorithm, Chinese scholars propose to use semantic distance. At the same time, on this basis, a Chinese English hybrid semantic similarity algorithm is proposed, and the information theory-related knowledge is applied to practical application [20][21][22][23].

Improvement Analysis of Semantic Similarity Algorithm Based on Linguistic Structure Concept
In this section, aiming at the problems of the traditional semantic similarity algorithm, starting from the linguistic concept structure, we propose to analyze the semantic depth and distance to build an improved semantic similarity algorithm. e principle framework of the corresponding improved semantic similarity algorithm is shown in Figure 1. It can be seen from the figure that the core of the corresponding improvement lies in the semantic depth between different text data and the depth analysis between node distances. According to the information of frame types described by special associated data resources, it can be seen from the figure that the weight of interconnected edges is used for impact analysis. At the same time, it can be seen from the figure that the corresponding semantic similarity algorithm is based on a specific algorithm model.

Language Structure Depth and Node Distance Trigger
Analysis. In order to solve the high-dimensional sparse problem and the corresponding accuracy problem of the traditional semantic similarity algorithm, this section uses linguistic concept structure to optimize the semantic similarity algorithm from the feature of text dependent information and corresponding node distance. e core principle architecture of the algorithm is shown in Figure 2. From the figure, we can see the principle and calculation method of the two optimization technology points of the improved algorithm.
As can be seen from Figure 2, the details of the corresponding deep analysis algorithm of language structure and the analysis algorithm of node distance of language structure are as follows.

Deep Analysis Algorithm of Language Structure.
e depth analysis algorithm of language structure is mainly the depth analysis of the common problems of text information [24][25][26]. In essence, it uses the common amount of information between two or more texts and the corresponding common information nodes for depth analysis.
rough the analysis of these common amounts, it realizes the similarity calculation and analysis between text words and designs the corresponding core scheme based on the application scenarios in this paper. e calculation of similarity between text words is shown in formula (1), where the corresponding W i and W j represent two different texts and the corresponding H(W i ) and H(W j ) represent the corresponding probability function of the corresponding text in the text vocabulary. In this situation, assuming that the corresponding text information is independent of each other, the corresponding text similarity probability function is the product of independent functions: Based on the above core formula, the similarity between two words or different texts is calculated according to the similarity of information between different texts. e corresponding calculation formula is shown in formula (2), in which the corresponding overall similarity is used as the reference basis of the semantic similarity algorithm and the corresponding Ha (W i , W j ) represents the set of shared information between different texts. From the corresponding calculation formula, when the corresponding text information is consistent, the corresponding similarity formula is 1. (2) Based on formulas (1) and (2), the calculation formula of semantic similarity between different texts based on shared information can be further obtained, as shown in the following formula: Based on the above calculation, the corresponding flow chart of the language structure depth analysis algorithm is shown in Figure 3.

Trigger Analysis of Node Distance.
In order to improve the accuracy of the semantic similarity algorithm, different text semantic node distances are introduced for auxiliary calculation. Different text information is arranged in one or more network structure diagrams according to the set organization rules, and the shortest connection distance between different texts is found based on this structure diagram [27][28][29]. Based on this judgment, the more sides the path passes through in the corresponding text, the longer the semantic distance between the corresponding two nodes, and the lower the similarity between two different texts, the less the number of corresponding paths, the higher the similarity. e flow chart of trigger analysis based on the corresponding node distance is shown in Figure 4.
According to the information of special data resource description frame type, the weight of interconnection edge is used for impact analysis [8,30]. In practical analysis, the main factors that affect the weight calculation of interconnection edge include the depth of the layer where the connecting edge is located, the density attribute of the corresponding edge near the connecting edge, and the corresponding node out degree. e corresponding core Complexity Improved semantic similarity algorithm based on linguistic structure concept Auxiliary means 1 Auxiliary means 2 e depth of language structure Trigger analysis of node distance Two or more texts Rules of organization

Common information node
In-depth analysis Probability function

Accumulate common information
Comparative analysis e similarity between two words or different texts is calculated according to the similarity of information between different texts Text dataset Network structure diagram Step 1: establish the tree model of the corresponding data text, and the corresponding model has only has one root node; Step 2: the relationship between the corresponding different tree views assumes that there is only the relationship between the upper and lower positions, and there is only one path between the corresponding different nodes.
Step 3: except for the root node of the tree view, only one parent node is set for other nodes; Step 4: count all the child nodes.

Analysis of text database processing
Contrast text database 2

Comparative analysis of texts
Contrast text database 1 Figure 2: Schematic diagram of the improved semantic similarity algorithm based on linguistic concept structure.

High-dimensional sparsity problem and its accuracy problem
Deep analysis algorithm of language structure Deep analysis of common problems of text information In this paper, we use the common amount of information between two or more texts and the corresponding common information nodes for indepth analysis Step 4: counts all child nodes.
Step 3: except for the root node of tree view, only one parent node is set for other nodes; Step 2: the relationship between the corresponding tree views assumes that there is only the relationship between the upper and lower bits, and there is only one path between the corresponding different nodes.
Step 1: establish a tree view model of corresponding data text, and the corresponding model has only one root node; Distance between semantic nodes of the same text Auxiliary calculation Auxiliary algorithm steps High-dimensional sparsity problem and its accuracy problem Contrast text database 1 Contrast text database 2 ……… Trigger analysis of node distance Based on formula (4), the formula for calculating the semantic distance of the corresponding data text information node in the case of special data types is shown in the following equation: In order to solve the problem that the traditional node frequency statistics rely too much on large-scale database,

General text information data
Special data resources describe the information of frame type Trigger analysis of node distance e weight of interconnection edge is used for impact analysis Different text information is arranged in one or more network structure diagrams according to the set organization rules Find the shortest distance between different texts e more edges the path passes through in the corresponding text, the longer the semantic distance between the two corresponding nodes Step 1: establish a tree view model of corresponding data text, and the corresponding model has only one root node; Step 2: the relationship between the corresponding tree views assumes that there is only the relationship between the upper and lower bits, and there is only one path between the corresponding different nodes.
Step 3: except for the root node of tree view, only one parent node is set for other nodes; Step 4: counts all child nodes.
Need to focus on solving the problem of node frequency Special data processing flow Text comparison e depth of language structure Two or more texts

Common information node
In-depth analysis Comparative analysis Probability function

Accumulate common information
Core formula Text information is independent of each other When the text information is consistent, the corresponding similarity formula is 1 Complexity the text data domain model is used in the design, and the corresponding unified calculation process is as follows: Step 1: establish the tree model of the corresponding data text, and the corresponding model has and only has one root node Step 2: the relationship between the corresponding different tree views assumes that there is only the relationship between the upper and lower positions, and there is only one path between the corresponding different nodes Step 3: except for the root node of the tree view, only one parent node is set for other nodes Step 4: count all the child nodes

Modeling Process of Data Information Based on Linguistic
Conceptual Structure. Based on the improvement of the semantic similarity reading algorithm, the paper also designs the data information association model between the texts. In the actual design, two levels of design content are considered, which correspond to the logical level and semantic relationship between different texts. e corresponding text information logic level mainly considers the relationship between information text and, or, and non, and the corresponding semantic level relationship mainly considers the semantic information-related situation between different data texts. e establishment process of the data information association model based on this correspondence is shown in Figure 5, among which the main association construction is in the logic level and semantic level. At the corresponding logical level, this paper mainly checks the attribute values between different texts and establishes the corresponding association between the same text attribute values; that is, the text with the same attribute is given the same identifier; on the semantic level, this paper is mainly used to construct or add some semantic relationship between different texts.  The algorithm proposed in this paper

Experiment and Analysis
In order to verify the superiority of the algorithm, this part takes a huge knowledge base as the experimental blueprint for experimental simulation. e experiment mainly verifies the corresponding similarity value. In the experiment, three groups of three types of words are used to test the algorithm. e corresponding words and similarity calculation results are shown in Table 1, and the corresponding line graphs obtained from more experiments are shown in Figures 6(a)-6(c). It can be seen from the figure that the algorithm in this paper is more in line with the basic human cognitive law when judging the similarity between different texts.
It can be seen from Table 1 that when the corresponding words are nouns, adjectives, and verbs, the algorithm in this paper is obviously better than the traditional algorithm, but there is little difference in the accuracy of verb level similarity, which is also the direction of further research and improvement in this paper. e similarity simulation experiment is carried out for the information of special data resource description frame type. e corresponding experimental object is the information number database of a certain data resource description frame type, and the corresponding information node attributes are author, nationality, number classification, and publishing year. In order to reflect the superiority of this algorithm in dealing with such data information, this paper selects another two common algorithms for comparative analysis in the actual experiment, the corresponding algorithms are RO algorithm and H algorithm, and the    Complexity corresponding similarity calculation table is shown in Table 2, and the corresponding special type of data selected 8 types of books. e corresponding similarity line graph is shown in Figures 7(a)-7(c). From the graph, it can be seen more intuitively that the algorithm in this paper still has the advantage of similarity calculation when dealing with special data types, and it is also more in line with the law of human judgment.
To sum up, the experimental analysis shows that this algorithm has obvious advantages compared with the traditional semantic similarity algorithm.

Conclusion
In this paper, the current information data processing and analysis problems are analyzed; at the same time, the semantic similarity algorithm proposed by the current relevant scholars is fully studied and analyzed; and the existing problems of the current related algorithms are pointed out. Based on this, in order to solve the problems of the traditional semantic similarity algorithm, this paper studies and verifies the high-dimensional sparsity and accuracy problems. Starting from the linguistic conceptual structure, this paper analyzes the depth of language structure and the distance between nodes. For special data resources to describe the information of frame type, we use the interconnection edge. e impact analysis of weight is carried out so as to realize the semantic similarity impact analysis of full information data. Based on the above improvements, this paper also systematically establishes the modeling process of data information based on linguistic conceptual structure and establishes the corresponding model. In the experimental part of this paper, the improved algorithm is simulated and analyzed. e simulation results show that the proposed algorithm has obvious accuracy improvement compared with the traditional algorithm. At the same time, the problem of high-dimensional data sparsity has been further solved. In the following research, this paper will further explore the essential meaning of linguistic structure and further analyze the semantic similarity algorithm based on the essential meaning so as to improve the accuracy of the algorithm and release the corresponding algorithm loss.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest or personal relationships that could have appeared to influence the work reported in this paper.