Key Information Extraction Algorithm of Different Types of Digital Archives for Cultural Operation and Management

In order to improve the effect of key information extraction from digital archives, a key information extraction algorithm for different types of digital archives is designed. Preprocess digital archive information, taking part of speech and marks as key information. Self-organizing feature mapping network is used to extract the key information features of digital archives, and the semantic similarity calculation results are obtained by combining the feature extraction results. Combine with mutual information collection, take that word with the highest mutual information value as the collection cent, traverse all keywords, and take the central word as the key information of digital archives to complete the extraction of key information. Experiments show that the recall rate of the algorithm ranges from 96% to 99%, the extraction accuracy of key information of digital archives is between 96 and 98%, and the average extraction time of key information of digital archives is 0.63 s. The practical application effect is good.


Introduction
Generally speaking, there are no management problems in the spontaneous stage of cultural production. Cultural production and business activities are the product of commodity economy [1,2]. When material and cultural production develop to a certain extent, social division of labor is further clarified, and professionals and professional groups engaged in cultural production appear, both the ruling class and the ruled class try to use cultural production to serve the interests of their own class, which is the conscious stage of cultural production [3]. Only at the conscious stage of cultural production can cultural operation and management be put on the agenda. In a modern capitalist society, everything into a commodity culture products without exception has become a part of capitalists, for-profit special goods. e vast majority of cultural activities are restricted by the value of commodity production rules [4]. e tendency of commercialization of cultural production and cultural activities has become a common social phenomenon in the field of capitalist culture. e law of market economy dominates the management of cultural operations and management activities, and the quality of management is the key to the success or failure of specific cultural products in the free competition of the cultural market. Especially with the rapid development of social economy, different types of digital archives are gradually increasing, the main products of these are social culture, in order to better manage for these different types of digital files, need to study a new key to different types of digital archives information extraction algorithm, to enhance the management level of the digital scheme. erefore, it is of great significance to study a key information extraction algorithm of different types of digital archives.
In the digital archives, information extraction is an important research topic, reference [5] proposed a digital book records mass data fast extraction algorithm. Based on the range characteristics of large-scale data attributes, the distribution samples of digital book archive data are divided into multiple subintervals to achieve data classification. By constructing a neuron model, the error terms of output are determined according to the data output of the hidden layer and output layer, and the weight of each layer of the BP neural network is adjusted. is method builds a fast extraction model based on BP neural network and realizes the fast extraction of massive archive data. Reference [6] proposed a key information extraction algorithm based on TextRank and cluster filtering. First, the key information is extracted and vectorized for Word2Vec. en, TextRank is improved by constructing a graph model integrating word eigenvalues and edge weights, and the stable graphs obtained by iterative convergence are merged and clustered to form clusters. en, a cluster quality evaluation formula was designed for cluster filtering, and TextRank was applied to form the final clustering. Finally, annotate the information type of the cluster. For testing the text, by comparing the key information vector distance cluster heart vector and the words information types, combines information type and key information to get the key of the text information. Reference [7] proposed a hidden Markov model based on an improved extraction algorithm of key information extraction. e web document is converted into D0M tree and preprocessed, and the information item to be extracted is mapped to state and the observation item to be extracted is mapped to vocabulary. e improved hidden Markov model is used to extract key information of the text. Reference [8] proposed a key information extraction algorithm based on word vector and location information. Vector representation model by word learning vector of each word in the target document said, will the reflect of the latent semantic relations between the word and the word vector combined with location feature fusion to the PageRank score model, choose a few top words or phrases as the key target document information, in order to complete the digital archives of key information extraction. Reference [9] proposed a key information extraction algorithm of unstructured text in the knowledge database. Six yuan group was used to optimize the hidden Markov model, probability model, and smooth processing of incomplete training samples. Initialization and termination operations were carried out for the sequences of observation values released at different times to obtain the optimal state sequence. After decoding the observation sequence, the positive sequence and reverse sequence were obtained by comparing them to filter out the states without decoding ambiguity and complete ambiguity elimination. According to the maximum probability state sequence, the text key information to be extracted is defined and the key information is extracted. However, the above-mentioned key information extraction algorithm is suitable for different types of digital files, and the effect is not ideal because the boundary of key information extraction is uncertain. erefore, this paper designs a new key information extraction algorithm for different types of digital archives. Firstly, the algorithm divides the main categories of key information, takes parts of speech and marks as features, and introduces the self-organizing feature mapping neural network to traverse the center of word set, thus realizing the extraction of key signals quickly and accurately. e effectiveness of the algorithm is verified by experiments.

Digital Archive Processing
2.1.1. e Text Participle. In the process of cultural management, there are many types of digital archives. Before extracting the key information of different types of digital archives, it is necessary to preprocess the key information of digital archives. e preprocessing process includes word segmentation and marking. Word segmentation refers to classifying the words in the text and setting the marks according to the categories, which lays the foundation for the key information extraction of digital archives in the future [10]. e difference between the word segmentation process of the reverse maximum matching algorithm and the forward maximum matching algorithm is that the scanning of the reverse maximum matching algorithm starts from the end of the string. Each unsuccessful match removes the preceding word until the match is successful. en the basic idea of the bidirectional maximum matching algorithm is: When segmenting different types of digital archives information, firstly, a forward word-for-word maximum matching algorithm is applied to the character string to be processed, then a reverse word-for-word maximum matching algorithm is applied, and the output result is used to complete the word segmentation processing. Assuming bidirectional maximum matching word segmentation for S � (C 1 , C 2 , . . . , C i ), the algorithm process can be described as follows: (1) First take out the first word C 1 in S, and search in the dictionary to see if there are any words with C 1 as the prefix. If there are, save them as word marks [11]. (2) Take a word C 2 from S and match it with the dictionary to determine whether there is a word with C 2 as the prefix. (3) If it does not exist, split C 1 from string S, ending with a word split. (4) If there is, to determine whether C 1 C 2 into words, calculate the number n headed by C 1 C 2 words. (5) If n � 0, the participle ends once [12]. (6) If n is not 0, then take a word C i from S and match it with the dictionary to determine whether there is a word prefixed with C 1 , C 2 , . . . , C i . (7) If yes, go to Step 6. (8) If it does not exist, split C 1 , C 2 , . . . , C i−1 from string S, ending with a word split. (9) Continue word segmentation from string C i of S, repeat the above steps until the end of string S forward segmentation.
(10) Take out the last word C n in S and match it in the dictionary to find whether there is a word with suffix C 1 . If so, save it as a word mark [13]. (11) en take out a word C n−1 from S and match it with the dictionary to judge whether there is a word with suffix C 1 C 2 . (12) If it does not exist, it splits C n from string S, ending with a word split.
(13) If there is, then judge whether C n−1 C n is a word and count the number of words starting with C n−1 C n , expressed by n.
2 Computational Intelligence and Neuroscience (14) If n � 0, then the participle ends. (15) If n is not 0, take out a word C i from S and match it with the dictionary to determine whether there is a word with C i , . . . , C n−1 C n as the suffix. (16) If yes, go to Step (15). (17) If it does not exist, C i , . . . , C n−1 C n will be cut out from string S and a word segmentation will end. (18) Continue word segmentation from word C i of string S, and repeat the above steps until the end of reverse segmentation of string S, so as to remove the stop word. e specific implementation process is shown in Figure 1.

e Part of Speech Tagging.
Part of speech is a grammatical attribute of vocabulary, which generally indicates the type of a word in the corpus. Part-of-speech tagging refers to the process and method of tagging the part of speech of each word. Some words contain multiple parts of speech, with different parts of speech and completely different ways of expression [14,15]. However, in general, when a word contains one or more parts of speech, the frequency of its commonly used parts of speech is far greater than that of other parts of speech, so the accuracy of POS tagging can be ensured on the whole, and the POS tagging method can be applied to most application scenarios [16]. Conditional Random Field Algorithm (CRF) was proposed by Lafferty et al. in 2001. It is an undirected graph model combining the characteristics of the maximum entropy model and hidden Markov model. In recent years, good results have been achieved in sequence tagging tasks such as word segmentation, part-of-speech tagging, and named entity recognition [17]. One of the simplest conditional random fields is the chain structure, in this special conditional random field, the chain structure is composed of several character marks. In CRF models with only one order chain, the fully connected subgraph covers the set of the current marker and one marker before it, as well as the maximum connected graph of any subset of the observation sequence. e chained conditional random field is shown in Figure 2, and the set of vertices can be regarded as the maximum connected subgraph.
In the sequence labeling task, random variable X � x 1 , x 2 , . . . , x n represents the observable sequence, random variable Y � y 1 , y 2 , . . . , y n represents the corresponding marker sequence of the observed sequence [18], and the chained conditional probability distribution of the random variable Y is: In the above formula, f k (y i−1 , y i , x) is the state feature function for edge and capture mark transfer features.
is the non-negative factor for each node. f k ′ (y i , x) is the state feature function that captures the current marked feature for the edge. λ k and λ k ′ are learning model parameters [19], said the weight of characteristic function. Z(x) is a normalizing factor dependent only on the observation sequence. e specific calculation formula is as follows: Conditional random field reasoning refers to finding a marker sequence Y � y 1 , y 2 , . . . , y n corresponding to the most probable one given an observation sequence X � x 1 , x 2 , . . . , x n . In the distribution function of conditional random fields, the normalized factor is completely independent of the marker sequence [20]. erefore, given the model parameters, the most likely marker sequence can be expressed as: When the current sequence position is i and the current label is y, the algorithm can be used to obtain the unnormalized probability value of the optimal label sequence to the current position. Its recursive form is:

Key Information Feature Extraction of Digital Archives.
Self-organizing feature mapping neural network was proposed by a professor of neural network expert self-organizing feature mapping network of University of Helsinki, Finland in 1981 [21]. is network simulates the function of self-organizing feature mapping of the brain nervous system. It is a kind of competitive learning network, which can carry out self-organizing learning without supervision in learning [22]. is paper uses this method to extract the key information features of different types of digital archives. is can improve the accuracy and efficiency of extracting key information from archives. e structure of self-organizing feature mapping neural network is shown in Figure 3.
We set the number of neurons in the input layer to be n, and the number of neurons in the competition layer to be M � m 2 . e input layer and the competition layer form a two-dimensional planar array. e two layers are connected, and sometimes neurons in the competing layer are also connected by edge inhibition [23]. ere are two kinds of connection weights in the network, one is the connection weights of neurons responding to external inputs, and the other is the connection weights between neurons, whose size controls the size of interactions between neurons [24,25]. e connections of neurons at the competitive layer of each input neuron in the self-organizing feature mapping Computational Intelligence and Neuroscience 3 network structure shown in Figure 3 are extracted, as shown in Figure 4. Set the input mode of the network as P k � (p k 1 , p k 2 , . . . , p k n ), k � 1, 2, . . . , q and the neuron vector of the competition layer as A j � (a j1 , a j2 , . . . , a jm ), j � 1, 2, . . . , m. Where P k is a continuous value and A j is a numerical quantity. e connection vector between neuron j of the competition layer and neuron of the input layer is W j � (w j1 , w j2 , . . . , w jm ), j � 1, 2, . . . , M. 4 Computational Intelligence and Neuroscience e self-organizing learning process of the self-organizing feature mapping network can also be described as: for each input of the network, only part of the weight is adjusted to make the weight vector closer to or more deviated from the input vector.
is adjustment process is competitive learning. With continuous learning, ownership vectors are separated from each other in vector space, forming a class of patterns representing input space, respectively, which is the clustering function of automatic feature recognition in a selforganizing feature mapping network. e learning and working rules of the network are as follows: (1) Initialization Assign the network connection weight w ij to the random value i � 1, 2, . . . , N, j � 1, 2, . . . , M in the interval [0, 1]. e initial value of learning rate η(t), η(t), 0 < η(t) < 1 was determined. Determine the initial value N g (0) of neighborhood N g (t).
Neighborhood N g (t) is essentially a region centered on the winning neuron g and contains several neurons. is area is generally uniformly symmetrical, most typically a square or circular area. e value of N g (t) represents the number of neurons in the neighborhood during the t-th learning. Determine the total number of studies T. (2) One of the q learning modes P k , P k is provided to the input layer of the network and normalized. e specific calculation formula is as follows: (3) Normalize the connection weight vector W j � (w j1 , w j2 , . . . , w jN ) and calculate the Euclidean distance between W j and P k . e calculation formula of W j is as follows: e Euclidean distance between W j and P k can be calculated by the following formula: (4) Find the minimum distance d g and determine the winning neuron g.
(5) Adjust the connection weights, and modify the connection weights between all neurons in neighborhood N g (t) of the competition layer and neurons of the input layer. e specific formula is as follows: In the above formula, η(t) is the learning rate at moment t. (6) Select another learning mode to provide to the input layer of the network and return to step (3) until all q learning modes are provided to the network. (7) Updated learning rate η(t) and neighborhood N g (t).
In the above formula, η(0) is the initial learning rate, t is the number of learning, and T is the total number of learning. Assume that the coordinate value of a certain neuron g in the competition layer in the two-dimensional array is (x g , y g ), then the range of neighborhood is point (x g + N g (t), y g + N g (t)) and point (x g − N g (t), y g − N g (t)) as the square in the upper right corner and the lower left corner, and the modified formula is as follows: In the above formula, INT(·) is the integral function. (8) Let t � t + 1, return to step (2), until t � T.

Key Information Extraction Algorithm of Digital Archives.
Key in the process of information extraction, in the digital archives to effectively extract the digital archives of key information, cannot individually understand the individual words of digital archives, and words or similar to each other in the digital archives correlation words combined into a block, a comprehensive understanding of the whole text content and the exact meaning of each word. erefore, the semantic similarity between words is used as the clustering distance. All the semanemes of a word will form a hierarchical structure similar to a tree according to their upper and lower positional relations, which is traversed through the tree. Finally, the distance between words can be used to judge the similarity of word meaning. e formula for calculating word distance is as follows: Computational Intelligence and Neuroscience In the above formula, p 1 and p 2 represent two semesters, which are variable parameters. dist(p 1 , p 2 ) represents the length of the path between two sememes of a word. e semantic origin of describing concepts is divided into four parts: e first basic semantic origin, the symbolic semantic origin, the relational semantic origin, and other independent semantic origin. e overall similarity between concepts is calculated by the following formula: In the above formula, s 1 and s 2 represent two concepts, and y i represents the result of feature extraction. If there are two words w 1 and w 2 in the set, among which word w 1 has n concept descriptions and word w 2 has m concept descriptions, the maximum similarity between concepts w 1 and w 2 can be used as the semantic similarity of the two words, and the calculation formula is as follows: e process of key information extraction algorithm of digital archives is as follows: Preprocessing: Word segmentation for digital archival text, stop word overconsideration.
Step 1: Calculate all candidate words and semantic similarities between w i and w j in digital archival text Sim(w i , w j ).
(1) TF-IDF value is calculated, and word W � W 1 , W 2 , . . . , W N with word frequency greater than the threshold t is selected as the candidate key information. e calculation formula of TF-IDF value is as follows: In the above formula, tf i is the number of occurrences of the word in the current digitized archival text, N is the total number of digitized archival text, and n i is the number of digitized archives containing the word w i in the database. (2) During initialization, each word W i in the candidate word has a cluster Z i , a total of n clusters, and all of them are set with unaccessed markers. (3) Among all non-visited word clusters, select the cluster pair (C l , C k ) with the largest similarity, that is, the closest distance, by calculating the maximum value of Sim(w i , w j ). If Sim(C l , C k ) is less than the given threshold, turn to (6); otherwise, merged clusters C l and C k are new clusters C 0 � C l ∪ C k . Set to current cluster C, C to no access flag, C l and C k to access flag. (4) Calculate the semantic similarity among all unaccessed word clusters, and transfer to (4).
(5) After clustering, the first k words with better quality are selected from each cluster Z i as the final key information, so as to obtain the candidate word set W � C 1 , C 2 , . . . , C m .
Step 2: Treat each word in the text as a set C i , a total of N sets (N is the number of words in the text).
Step 3: Select the two sets C i and C j with the greatest similarity from the N sets, and combine the two sets into a new set C.
Step 4: Select the center point of the current set: calculate the mutual information sum of the words in the current set and other words outside the set, and select the word with the largest mutual information value as the center point of the current set. If the calculated mutual information value between words is large, it indicates that they are also relatively large, on the contrary, it indicates that they are relatively small. e mutual information between w i and w j , that is, the public information between w i and w j , is calculated as follows: In the above formula, p(w i , w j ) is the common frequency of w i and w j , p(w i ) is the separate frequency of w i , and p(w j ) is the separate frequency of w j . According to the above formula, when I(w i , w j ) > 0, the greater the value, the more public information between w i and w j and the stronger the correlation; when I(w i , w j ) � 0, there is less public information between w i and w j and the correlation is weak; when I(w i , w j ) < 0, there is no correlation between w i and w j .
Step 5: Among other words outside the set, select the word with the highest similarity with the center point of the set. If the similarity value is greater than the threshold, add it to the current set C; calculate the mutual information between the central point of the current set and the words outside the set, and add the word with the largest mutual information value to the current set C.
Step 6: Turn to step 4 to update the current collection center point until all words are accessed. If the mutual information value between the central point of the set and other words outside the set is less than 0, perform step 3 for the remaining unreachable words until all the words are accessed and divided.
Step 7: In the final cluster set, select its first K central words as the key information of the text. e key information extraction algorithm flow of different types of digital archives is shown in Figure 5.

Experimental Scheme.
In order to verify the effectiveness of the algorithm designed in this paper to extract archive information, we conducted simulation experiments. is experiment is a simulation experiment, so it is necessary to design the experimental parameters, consider various factors, compare various types of simulation software and computers, and complete the design of environmental parameters of the simulation experiment, as shown in Table 1.
During the experiment, 500 GB digital archives were randomly selected from schools, enterprises, and relevant administrative units as data sets, and 450 GB of them were randomly selected as training sets to train this method. e remaining 50 GB were used as test sets to test the key information extraction performance of different types of digital archives. In order to ensure the objectivity of the experiment, the title and core prompt will be filtered out in the process of extracting key information. Recall rate and accuracy rate are often used as indicators of the key information extraction effect of different types of digital archives. Recall rate R and accuracy rate P adopted in this experiment are defined as follows: In the above formula, l represents the number of extracted key information, and j represents the actual number of key information.
In the above formula, L represents the amount of key information accurately extracted. e time-consuming calculation formula for extracting key information of different types of digital archives is as follows: In the above formula, t i represents the time taken for the i-th key information extraction step of digital archives. Computational Intelligence and Neuroscience 7 digital archives of reference [5] algorithm, reference [6] algorithm, reference [7] algorithm, and algorithm of this paper are compared. e results are shown in Figure 6. By analyzing the data in Figure 6, we can see that the recall rate of the algorithm in reference [5] changes in the range of 58%-85%, the recall rate of the algorithm in reference [6] changes in the range of 49%-79%, and the recall rate of the algorithm in reference [7] changes in the range of 50%-87%. Compared with the experimental comparison algorithm, the recall rate of the algorithm of this paper changes in the range of 96%-99%, which is always higher than the experimental comparison algorithm, it shows that the key information of digital archives can be extracted comprehensively by using this algorithm, and the integrity is higher.

Analysis and
e key information extraction accuracy of different types of digital archives of reference [5] algorithm, reference [6] algorithm, reference [7] algorithm, and algorithm of this paper are compared. e results are shown in Figure 7.
By analyzing the data in Figure 7, we can see that the extraction accuracy of key information of digital archives of reference [5] algorithm is 49%-85%, the extraction accuracy of key information of digital archives of reference [6] algorithm is 54%-80%, and the extraction accuracy of key information of digital archives of reference [7] algorithm is 56%-80%. Compared with these algorithms, the extraction accuracy of key information of digital archives of the algorithm of this paper is 96%-98%. On the whole, the key information extraction accuracy of this algorithm is relatively stable, and there is no fluctuation of too high or too low, which indicates that the reliability of this algorithm in extracting key information is high. e accuracy of information extraction is higher, which can achieve the ultimate goal of accurately extracting the key information of different digital archives.
e extraction time of key information of different types of digital archives of reference [5] algorithm, reference [6] algorithm, reference [7] algorithm, and algorithm of this paper are compared. e comparison results are shown in Table 2.
By analyzing the results in Table 2, it can be seen that the average time-consuming of digital archives key information extraction of reference [5] algorithm is 1.41 s, the average time-consuming of digital archives key information extraction of reference [6] algorithm is 1.39 s, and the average time-consuming of digital archives key information extraction of reference [7] algorithm is 1.49 s, which is the highest among the four algorithms. Compared with these algorithms, the average extraction time of key information of digital archives in this algorithm is 0.63 s, which has a shorter extraction time and higher efficiency, and can realize the rapid extraction of key information of digital archives.
To sum up, the recall rate of this algorithm changes in the range of 96%-99%, the accuracy of key information extraction of digital archives is 96%-98%, and the average time-consuming of key information extraction of digital archives is 0.63 s. It can achieve the goal of rapid and accurate extraction of key information of digital archives, solve a variety of problems existing in traditional methods, and can be widely used in many fields.

Conclusions
With the continuous optimization of cultural operation and management strategies, the level of cultural operation and management has been gradually improved, and digital archives management is an important part of cultural operation and management. erefore, extracting the key information of different types of digital archives is of great significance to the level of cultural operation and management. erefore, this paper designs a key information extraction algorithm of different types of digital archives for cultural operation and management. e experimental results show that the recall rate of the algorithm is between 96% and 99%, the accuracy of key information extraction of digital archives is 96%-98%, and the average time-consuming of key information extraction of digital archives is 0.63 s. It can achieve the goal of rapid and accurate extraction of key information of digital archives and can be widely used in cultural operation and management, in order to improve the quality of cultural operation and management to the greatest extent, promote the further development of the cultural industry. However, the convergence of this algorithm is not tested in the process of operation. In order to avoid falling into the local optimum, it is necessary to increase the optimization of the algorithm in future research work to avoid too many iterations or high errors.
Data Availability e dataset can be accessed upon request.

Conflicts of Interest
e authors declare no conflicts of interest.