Emotional Calculation Method of Rural Tourist Based on Improved SPCA-LSTM Algorithm

New technologies such as big data and cloud computing provide new means and tools for rural development. The big rural tourism, with its convenience, quickness, and low threshold, presents great convenience for tourists’ emotional calculation and has become one of the main sources of tourism big data. Under the guidance of big data theory and emotion theory, this paper proposes an emotional calculation method of rural tourists based on improved SPCA-LSTM algorithm, taking big text data as data source. Firstly, the improved TF-IDF algorithm is designed to highlight the importance of feature items, and the word vector trained by word2vec model is applied to represent the rural tourism data text. Then, a weighted sparse PCA (RSPCA) is constructed to reduce the dimension of massive word vector features. RSPCA introduces the weighted l1 optimization framework and LASSO regression model into the mathematical model of PCA algorithm and establishes a new data dimension reduction model. Thereupon, the long-term and short-term memory convolution network with attention mechanism is employed to extract text features. Finally, the feature vector is utilized to calculate the rural tourist’s emotion by softmax function. The experimental results indicate that the improved SPCA-LSTM algorithm, whose performance index is better than other existing algorithms, is effective in calculating tourists’ emotions. Also, it is more suitable for the research of tourist sentiment calculation in the era of big data.


Introduction
Rural revitalization is an important part of urban economic development and a beneficial assist for urbanrural integration. Rural tourism is a relatively efficient way that China has explored during poverty alleviation for more than 30 years, and a further way to overcome the problem of rural revitalization during the "14th fiveyear plan" [1]. New technologies, such as artificial intelligence, 5G, and big data, bring infinite possibilities to China's economic and social development, provide new means and tools for rural development, and are powerful support for rural revitalization in the new era. Not only does digital technology empowering rural revitalization change the traditional production mode and lifestyle, but also subtly changes farmers' practice and thinking mode.
Promoting rural revitalization and industrial prosperity is the endogenous power support. Rural characteristics and the law of information development should be respected. For villages with rich local characteristic resources and relatively complete information facilities, they can rely on "Internet +" to dig local characteristic cultural resources and do a good job of digital "cultural tourism" and "agricultural tourism" combined with articles to build Internet characteristic tourism villages [2].
With the rise of new media and social networks and the advent of the era of big data, various platforms have generated hundreds of millions of rural tourism-related data. The past data showcase that since 2009, the amount of data related to rural tourism has increased geometrically year by year. Due to the convenience of social transportation and the development of social networks, the data with "rural tourism" as the key word is widely spread in rural areas. People's demand for "rural tourism" is increasing day by day. The data of "rural tourism" can be analyzed and mined through various information release platforms [3,4].
Rural tourism reception will increase steadily. Data show that from January to August 2020, the number of leisure agriculture and rural tourism in China decreased by 60.9% to 1.207 billion. The epidemic situation in COVID-19 has pushed the "pause button" for leisure agriculture and rural tourism. However, after the domestic epidemic was cleared, all localities restarted the rural leisure tourism market in an orderly manner. People's enthusiasm for rural scenery and fresh air is high. With the gradual restoration of production and living order, the suppressed demand of urban and rural residents will continue to be released. The countryside with beautiful scenery and beautiful ecology is more attractive than ever before [5].
The rapid development of information technology provides an efficient and rapid way for rural tourist information acquisition [6]. The main reason for the popularity of rural tourism is that it provides tourists with multilevel experiences, and this information can be obtained through the classification of rural tourism big data driven by the rural revitalization strategy. The trend of rural tourism can be analyzed according to different types, different times, different spaces, and different groups. On various data platforms, search and collect data for specific keywords, and deeply analyze these data to get the corresponding results. The long-term data are verified and used as prediction support. Data mining on the future rural tourism market can make suggestions for future rural tourism [7].
Rural tourism is an activity of interpersonal communication and emotional exchange, and emotion runs through the whole process of tourism activities. "Emotion is a person's attitude experience about whether objective things can meet his own needs." Tourist emotion refers to the pleasure, excitement, sadness, anger, regret, and other emotional experiences generated by tourists in tourism activities due to the influence of personal factors or external environment on whether tourism activities meet individual basic needs and social needs and presents diversity and variability with the progress of tourism [8]. These emotions not only constitute an important tourist experience but also exert an important influence on tourist motivation, satisfaction, behavioral intention, and interpersonal interaction. In tourism, tourists get information and share their travel experiences through online platforms and social media. The text, image, audio, and video released by it become the main data source of tourism big data [9]. Among them, the texts content with its convenient, simple, intuitive, fast, and low threshold for tourists to express emotions and information exchange to provide convenience; in the tourism, big data occupies an increasingly important position. The mining of text data can provide decision support for tourism planning and marketing, making sentiment analysis in tourism big data a hot issue in tourism research [10].
With the deepening of research, emotion analysis, which is to effectively analyze and mine information and identify emotional tendencies, becomes more sophisticated. Also, there are researches on emotion calculator, emotion summary, product attribute mining, and so on. In recent years, with the development of big data, there are many emotion analysis models and software at home and abroad, which provide strong support for emotion research. There are three methods of text sentiment analysis: dictionary-based method, machine learning method, and deep learning method [11]. The dictionary-based approach is mainly through the development of a set of emotional dictionaries and rules. Then, sentiment value is calculated by sentence breaking, analysis, and dictionary matching. Finally, the emotional value is used as the basis to judge the emotional tendency of the text. Although the method based on emotion dictionary has high accuracy, it has low recall rate. Another problem of the dictionary-based approach is the relatively high cost of dictionary construction [12]. The machine learning-based approach transforms the problem of text sentiment analysis into a supervised classification problem. Annotate the training text. Supervised machine learning is then carried out [13]. Finally, the test data is used to predict the results through the model. The method based on machine learning generally includes two steps: text preprocessing and the selection of classification algorithm. Methods based on deep learning use different artificial neural network models to map text big data into vector space to obtain numerical representation of words. The numerical expressions are then used as input to the deep learning model. The optimal model is obtained through training and parameter optimization. The process is much the same as that of emotion computing based on machine learning. However, the selection of feature extraction and classification model for text vectorization is different from machine learning [14,15]. This paper proposes a rural tourist emotional calculation method based on the improved SPCA-LSTM algorithm according to the above research. The algorithm improves the traditional TF-IDF algorithm by considering the distribution and location information of feature items within and between classes and combines it with Word2vec word vector to express the text. Then, reweighted SPCA algorithm is used to reduce the dimensionality of the text word vector, so that the original data features of the sample can be retained as much as possible while reducing the dimensionality. Finally, long-short-term memory (LSTM) and attention mechanism network are used to realize the emotional computing of rural tourists.

Tourist Emotion Calculation Method in
This Paper 2.1. Text Representation and Feature Selection (Word Vector Representation). In view of the shortcomings of traditional TF-IDF in the field of text classification, this paper improves the traditional TF-IDF algorithm to better reflect the importance of feature items in the text.
(1) Introducing intraclass factors to describe the distribution relationship of feature items The intraclass factor inter C i is introduced to judge the uniformity of the distribution of feature words in the 2 Journal of Sensors intraclass documents, that the intraclass factor value is small means that the feature items are unevenly distributed in the class and may be distributed only in a few documents in the class, with weak classification ability. On the contrary, the feature item has strong classification ability. The calculation formula of in-class factor of characteristic term is shown in where j represents the category. S ij represents the standard deviation of feature item i in category j, which reflects whether feature items are evenly distributed in the category. The smaller the standard deviation, the more uniform the distribution, the stronger the classification ability of this feature word. The calculation of S ij is shown in formula (2).
where K is the total number of documents in category j. tf ip indicates the number of times that document p contains feature i. tf ij represents the mean number of times of all documents of feature word i in category j. tf ij = 1/K∑ K k=1 tf ik .
(2) Improve the discrimination between classes In order to improve the weight of feature items with high classification, m/m i is used to express the importance of feature item i in category. N/n i indicates the importance of feature items in all documents, where N represents the total number of documents in the corpus. n i indicates the number of documents in which the feature word i appears in the corpus. m represents the total number of categories. m i indicates the number of categories containing feature items. If the m/m i value is larger, it indicates that feature items are distributed in fewer categories and have strong classification ability. If the value is small, the feature item is not representative. The specific calculation definition is shown in TF ij represents the word frequency of feature i relative to category j. The higher the value of TF ij , the better the feature i can represent this category. IDF i represents the distribution ratio of feature i among categories. The higher the IDF i value, the stronger the category discrimination ability of characteristic i.

(3) Word distance factor
The word distance indicates the difference between the last appearance position and the first appearance position of the feature item in the document, reflecting the range of the feature item in the text. The larger the range of feature items in this document, that is, the larger the word distance, the better it can reflect the category of the document. On the contrary, if the feature items are only distributed in a small range and the word distance is small, the category of the document cannot be well represented. If this feature item appears frequently locally in a document, the TF value will be increased, affecting the final algorithm result. Therefore, this paper introduces the word distance factor to avoid this problem. The formula for calculating the word distance factor is shown in where lastðg ip Þ represents the position number of the last appearance of feature item i in document p. firstðg ip Þ represents the position number of the first appearance of feature item i in document p. feaðpÞ represents the total number of feature items in document p.
Considering the intraclass, interclass, and position factors of feature items, a feature item weight calculation method TF-IDF-ICP (interior factor, category factors, and position) for text classification is proposed, which is defined as where TF i represents the word frequency of feature i relative to categories. IDF i represents the distribution ratio of feature i among categories. inter C i represents the intraclass factor. WDF ip represents the word distance factor. Artificial neural network can only accept numerical input, not a word as a string. In order to enable the deep learning model to process text data, first of all, it is necessary to express natural language as a numerical vector that the model can recognize. Word2vec is based on the simple shallow artificial neural network, according to the given large corpus, through training and optimization model to get the training result-word vector. This word vector can quickly and effectively express a word as a numerical vector, and can measure the similarity between words well, so as to represent the different attributes of the word. In text classification, it is necessary to convert the phrases in the text into low-dimensional word vectors. First, the text d is segmented by jieba. Text d after segmentation is D i = ½w 1 , w 2 ,⋯,w i ,⋯,w n . w i stands for the ith word in the document. n represents the total number of words in the document. Word2vec is then used to vectorize the text. w i is expressed as ½v 1 ,⋯,v i ,⋯,v l . l is the dimension of word vector, that is, each word is represented by l dimension. Word vectors trained by Word2vec retain the relevance of words in the original corpus, but ignore the importance of different words in the text. Therefore, this paper uses the improved TF-IDF algorithm TF-IDF-ICP to calculate the weight of each word and combine it with Word2vec word vector. The specific text representation is shown in D stands for Word2vec word vector. The specific steps of weighting Word2vec are shown in Figure 1.

Dimension Reduction of Word
Vector. It is a burden for the training and prediction of machine learning model to spend the conference. In this paper, the improved SPCA is designed to reduce the dimension of word vector to reduce the dimension of dataset. Then, the basic features are selected to minimize the information loss while compressing the data.
(1) Reweighted ℓ1 optimization framework ℓ1 optimization problem originates from ℓ0 optimization problem. ℓ0 optimization problem can be expressed as follows: given an m × n matrix A and a nonzero vector b, where m ≤ n. The sparse solution of Ay = b is solved. The mathematical form is as follows: where ∥y∥ 0 represents the number of nonzero solutions of y.
Since Equation (8) is a nonconvex optimization and NP difficult optimization problem, there is no effective solution at present, so only its approximate solution can be considered. Jojic et al. proved that the convex hull of ∥y∥ 0 is completely ℓ1 norm. Through convex analysis, it is reasonable to use ∥y∥ 1 to replace ∥y∥ 0 for optimization operation, thus leading to ℓ1 optimization problem. The mathematical model for the optimization problem is shown as follows: min y∈R n y k k 1 , s:t:Ay = b: ð9Þ ∥y∥ 1 = ∑ n i=1 jy i j, that is, the sum of absolute values of each element in the solution vector. ℓ1 optimization problem is a convex optimization problem that can be efficiently solved by using a convex programming tool. Through numerous experiments, it is shown for ℓ1 that by reasonably weighting their ℓ1 norm and iteratively updating the weights, the performance of their ℓ1 optimization framework is greatly enhanced. In the case of their ℓ1 norm, the larger one is punished more than the ℓ0 norm, according to the definitions for ℓ1 and ℓ0, while their ℓ0 norm treats both equally. Thus, if they add a weight matrix to the ℓ1 optimization problem, their coefficients of different sizes are punished equally. Then, their ℓ1 optimization problem will be infinitely close to their ℓ0 optimization problem, resulting in a scarcer solution. This is the advantage for a weighted ℓ1 optimization framework. The optimization problem for a weighted ℓ1 can be expressed as follows: By selecting an appropriate weighted matrix W, the reweighted ℓ1 optimization problem obtains a scarper solution for ℓ1 than the traditional ℓ1 optimization problem, thus making the result more like the ℓ0 optimization problem. The problem then turns to how to select a weighted matrix W so that the optimization problem for a reweighted ℓ1 can obtain a rarer solution while ensuring that the result is correct. According to the correlation proof, the absolute value of the weight should be inversely proportional to the value of the corresponding element in the final solution. However, if the specific value of the final solution is not known, the appropriate weighting matrix cannot be selected. At the same time, if the appropriate weighting matrix cannot be selected, the correct final solution cannot be obtained. A reasonable choice to solve this problem is to use an iterative approach, that is, by setting the weighted matrix as the identity matrix at the beginning, they can obtain an approximate solution of the reweighted ℓ 1 optimization problem. In the second iteration, the weighted matrix can be updated according to the approximate solution obtained, and then, the process is repeated until the termination condition is satisfied.
where the matrix Y is an n × c order matrix formed by the original data. The matrix V is a set of orthogonal bases. V T represents the transpose matrix of the matrix V. The principal component analysis algorithm reduces the dimension of the original data by using the base transformation method. The weighted sparse principal component analysis algorithm proposed in this paper adds an optimization frame and LASSO regression model to the principal component analysis algorithm. The mathematical model is as follows: in which Y is an n × c matrix. G and H are orthogonal matrices of order c × d. The matrix G T represents the transposition of the matrix G in which G c×d = ½g 1 , g 2 ,⋯,g d . H c×d = ½h 1 , h 2 , ⋯,h d . W is a weighted matrix of order c×c, and the matrix W is a diagonal matrix. λ is the regularization coefficient. After expanding the norm in formula (12), formula (13) can be get: W is the weighted matrix, and h i is the column vector constituting matrix H. According to the correlation between eigenvalues and eigenvectors, we can get: where ε is the eigenvalues corresponding to the vector e i . According to formula (14), formula (13) can be expanded as follows:

s:t:G T G = I, H T H = I:
ð15Þ When solving the mathematical model shown in Equation (15), it can be solved by alternating minimization method. The idea of alternating minimization method is to first assume that two initial matrices are given and use these two initial matrices for iterative calculation. During iteration, the matrix generated by the previous iteration is used to solve the matrix of the current iteration. That is, starting from any ðG 0 , H 0 Þ, matrices G, H are the obtained matrices. The matrix subscript represents the number of iterations. First, an initial matrix G 0 or H 0 is given, and then, iterative calculation is carried out. At the pth iteration, assuming a given matrix G ðp−1Þ , the matrix G ðp−1Þ is used to solve the matrix H ðpÞ . Then, the matrix H ðpÞ is used to solve the matrix G ðpÞ . Then, repeat the process until the number of iterations meets the preset stop conditions. First, consider the case that the orthogonal matrix G is known. When the orthogonal matrix G is known, only the matrix H needs to be solved in Equation (15). Therefore, the problem of solving the mathematical model shown in Equation (15) To solve the mathematical model shown in Equation (16), a new matrix G ⊥ needs to be introduced. The matrix G ⊥ is an orthogonal matrix. Therefore, matrix [GG ⊥ ] is an orthogonal matrix of order c × c. By projecting the rows of Y − YHG T onto matrix G and matrix G ⊥ , formula (17) can be get: Because YG ⊥ 2 F is independent of H, it is not necessary to consider YG ⊥ 2 F when solving the matrix H. Therefore, formula (15) can be deduced as shown Formula (18) is an elastic net regression problem, which can be solved by using LARS-EN (least angle regressionelastic net) algorithm [16]. At this point, when the matrix G is given, the matrix H can already be solved. At this time, the problem is how to solve the matrix G when the matrix H is given.
When the orthogonal matrix H is given, the orthogonal matrix G needs to be solved. Then, the mathematical model shown in formula (15)  Step 2. Initializes the matrix G to the principal component vector calculated in step 1, where G = ½α 1 , α 2 ,⋯,α k Step 3. According to the given matrix G, use LARS-EN algorithm to solve formula (18), and get the matrix G = ½β 1 , β 2 ,⋯,β k Step 4. According to the calculated matrix H = ½α 1 , α 2 , ⋯,α k , update the matrix G by singular value decomposition according to formula (19) Step 5. Repeat steps 3 and 4 until the termination conditions are met, and the final result will be obtained Word2vec model training sets the dimension of 400, and the obtained word vector is also 400. However, maintaining the congress brings a burden to the training and prediction of deep learning model. In this paper, the reweighted SPCA algorithm is used to reduce the dimension of word vector to reduce the dimension of dataset. Then, the basic features are selected to minimize the information loss while compressing the data.
The relationship between word vector dimension and variance value of principal components is shown in Figure 2, which indicates that the first 100 dimensions can already contain most of the information of the original data, so the first 100 dimensions of word vector data are selected as the input of the model for training.

Deep Learning Model.
Text is transformed into distributed word vector based on word2vec. Then, the dimension reduced word vector is obtained by reweighting SPCA algorithm as the input data of deep learning model. Text data belongs to time series data, and the emergence of each word depends on its previous word and the latter word. Recurrent neural networks are generally selected for training because of this dependence.
The structure of recurrent neural network (RNN) is different from that of general neural network, which usually consists of input layer, hidden layer, and output layer. There may be multiple hidden layers. On the other hand, the cyclic neural network adds a cyclic structure on the basis of the three-layer structure (see Figure 3). The left side of the equal sign is folded. The right side of the equal sign is an expansion. Xt is the input layer. Ht is the output layer. A is the hidden layer. Each A can be regarded as a neuron, and each neuron stores the previously input state first. After the operation, some relations between the current input and the previous input are retained, thus having the function of "memory." In this way, the previously calculated information can be captured, and the influence of the previously input data on the later data can be retained. Good timing is seized.
Long-short-term memory network (LSTM) evolved from RNN. LSTM avoids the problem of long-term dependence by deliberately designing and calculating the hidden layer state. Both RNN and LSTM have a chain structure of repeating artificial neural network modules. However, there are four neural network layers that interact with each other in a special way within each repeating module (see Figure 4).
In LSTM, "memory" is called cell state. The state of cells runs like a conveyor belt on the chain structure, and there is only a small amount of linear operation, which makes the information fidelity when it flows through the chain structure. In addition, LSTM adds information through a structure called a gate, which consists of a single sigmoid neural network layer and a single point multiplication operation, to capture long-term dependencies. The value range is 0~1, and whether the component information passes or not is controlled according to the numerical value. LSTM has three gates for protecting and controlling the cell state. The first gate is the "Forgetting Gate," which is implemented by the sigmoid layer and selectively forgets the information in the cell state. The second gate is the "input gate," and the sigmoid layer of the input gate determines which values to update. The candidate vector Ct created by the subsequent tanh layer selectively records the new information into the cell state. The third gate is the "output gate," whose object is the hidden layer ht, and the output part of the hidden layer is determined through the sigmoid layer. Then, it passes through the tanh layer, gets a value in (-1, 1), and multiplies it with the output value of sigmoid layer to determine the information to be output. In the LSTM network, the information in memory is selected to be retained or deleted through three gates. And the previous state, current memory, and input are combined. This structure has proved to be very effective in capturing long-term dependence. Compared with a single LSTM model, Bi-LSTM model utilizes the forward correlation information between the data  Figure 3: Structure of recurrent neural network. 6 Journal of Sensors at the time before and after the time series. Also, the reverse correlation information of the time before and after is considered. Therefore, it shows superior performance in the classification of time series. In this paper, Bi-LSTM model is selected as the training model.

The Emotional Calculation
Method of Improve SPCA-LSTM. The overall framework of the improved SPCA-LSTM emotion calculation method proposed in this paper is shown in Figure 5.
(1) Input layer: emotional text data of rural tourists (2) Text representation layer: it is to embed the words formed by combining Word2vev with TF-IDF-ICP algorithm for vector representation (3) Dimension reduction layer: the word vector is reduced in dimension by using the reweighted SPCA algorithm, which is used to reduce the dimension of the dataset. Dimension reduction vector is used as the input of depth model (4) Bi-LSTM layer: Bi-LSTM extracts the context semantics of text. Unidirectional LSTM generally captures the past state information and calculates the output of the current time. However, in many problems, the output of the current moment is related not only to past information but also to future information. Therefore, Bi-LSTM structure needs to be used. Bi-LSTM consists of two reverse lstms. The output at each time is determined by the forward output and the reverse output

Experimental
Parameters. The settings of model parameters in this paper are listed in Table 1 and Table 2. 3.3. Experimental Results and Comparison. Firstly, in order to verify the performance of emotion calculation method based on deep neural network proposed in this paper, the text method is compared with emotion calculation method based on emotion dictionary and emotion calculation method based on machine learning. Specific comparison methods are based on emotion dictionary ER (emotionrules), machine learning SVM (support vector machines), and machine learning NBC (naive Bayesian classifier). The test results of the emotional model are shown in Table 3.   Table 3, the tourist sentiment calculation method based on machine learning is better than that based on sentiment dictionary. The reason is that machine learning uses statistical methods to extract feature items from text data, and its nonlinear characteristics improve the reliability and accuracy of emotion calculation. In the machine learning method, NBC is better than SVM in all evaluation indexes. And the effect on training set and test set is relatively stable. The reason is that NBC, as a classic classification model in machine learning, has a solid mathematical foundation and stable classification efficiency.
Compared with ER, SVM, and NBC, the accuracy, recall, and F1 values of the improved SPCA-LSTM algorithm in this paper are greatly improved. The main reason is that Bi-LSTM model with attention mechanism is intelligent in text data feature extraction and learning methods. Deep learning relies on big data and many parameters to automatically fit nonlinear prediction functions, emphasizing the depth of model structure and highlighting the importance of feature learning. At the same time, the feature representation of samples in the original space is transformed into a new feature space by feature transformation layer by layer. Compared with the method of artificial feature construction by machine learning, the algorithm model in this paper fully embodies its advantages of big data.
In addition, word2vec uses high-dimensional vectors to convert words into real vectors and accurately retain their semantic information. Then, reweighted SPCA is used to reduce the vector dimension and reduce the computation of the deep learning model. The deep learning model not only gives full play to the strong processing ability of deep learning for high-dimensional data but also preserves the good timing of tourism text data. Therefore, the proposed model achieves better results than those based on emotion dictionary and machine learning.
In order to verify the effectiveness of this algorithm, this paper selects several deep learning models for comparison. The experimental results of each model are shown in Table 4.
The accuracy of this model is 5.44% higher than that of the traditional convolutional neural network. The precision rate is 5.0% higher. The recall rate is 5.49% higher. The F1 value is 5.19% higher. The highest accuracy rate of other text classification models is 96.7%. The highest precision rate is 96.5%. The highest recall rate is 96.6%. The highest F1 value is 96.6%. The accuracy of this classification model is 0.65% higher. The precision rate is 1.4% higher. The recall rate is 0.59% higher. The F1 value is 0.89% higher. Experimental data show the superiority of this method.

Conclusion
The results of big data analysis of rural tourism are introduced into the development planning of rural tourism, which will provide extremely beneficial strategic guidance for the realization of rural revitalization strategy. In order to realize the big data analysis of rural tourism, this paper proposes a rural tourism sentiment calculation method based on the improved SPCA-LSTM algorithm, which uses the improved TF-IDF+ Word2vec model to represent rural tourism data in vector. Vector dimension is reduced by reweighted SPCA. Then, Bi-LSTM model with attention mechanism is used to extract text features. Finally, SoftMax function is put into practice to calculate the emotion of rural tourists. The experiment demonstrates that the proposed algorithm is feasible and effective for sentiment analysis of rural tourists. In the next step, this paper can consider the construction of tourism-specific emotion dictionary and combine tourism-specific emotion dictionary with machine learning and deep learning methods to study tourist emotion computing.

Data Availability
The labeled dataset used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
The author declares no competing interests.