Research on Students' Mental Health Based on Data Mining Algorithms

With the diversification and rapid development of society, people's living conditions, learning and friendship conditions, and employment conditions are facing increasing pressure, which greatly challenges people's psychological endurance. Therefore, strengthening the mental health education of students has become an urgent need of society and a hot issue of common concern. In order to solve the problems of high misjudgment rate and low work efficiency in the current mental health intelligence evaluation process, a mental health intelligence evaluation system based on a joint optimization algorithm is proposed. The joint optimization algorithm consists of an improved decision tree algorithm and an improved ANN algorithm. First, analyze the current research status of mental health intelligence evaluation, and construct the framework of mental health intelligence evaluation system; then collect mental health intelligence evaluation data based on data mining, use joint learning algorithm to analyze and classify mental health intelligence evaluation data, and obtain mental health intelligence evaluation results. Finally, through specific simulation experiments, the feasibility and superiority of the mental health intelligent evaluation system are analyzed. The results show that the system in the article overcomes the shortcomings of the existing mental health intelligence evaluation system, improves the accuracy of mental health intelligence evaluation, and improves the efficiency of mental health intelligence evaluation. It has good system stability and can meet the actual current situation, which are requirements for mental health intelligence evaluation.


Introduction
Due to the rapid development of modern society, people's pressure to survive has also risen. e rising pressure of social competition [1, 2] has caused more and more people to have psychological problems such as emotions and behaviors. Every year, psychological barriers [3] and even suffering appear. e number of mentally ill people began to grow substantially. Resolving people's psychological problems [4] caused by various factors has become the most important work. e directions that need to be studied are how to provide people with mental health services [5] and improve the results of mental health work [6]. In response to the above problems, it is necessary to use innovative ideas and concepts, combined with modern technology and information technology, to build a mental health intelligent evaluation system [7] to improve the scientific level of mental work. Some scholars [8][9][10] have built a mental health service system for the elderly based on the internet technology, and some scholars have built an automatic evaluation system for users' mental health in online forums based on multifeature fusion. Improving users' mental health has improved the data collection and data collection of mental health. Regarding its management, its function is too one-sided, and its applicability is weak. Traditional mental health assessment methods [11][12][13][14] are mostly carried out in the form of scales or questionnaires. ere will be certain limitations in terms of survey objects and the accuracy of survey results. With the rise of the Internet technology, the application of network technology to mental health assessment has become a new phenomenon. e tendency of survey respondents to self-assess in a more concealed environment can reduce their psychological pressure.
Popular internet software will record user status to varying degrees. It can be seen that the Internet evaluation system is a highly applicable way of evaluating the mental health of college students [15]. Papineni et al. [16] proposed an automatic evaluation method for online forum users' mental health, designed an evaluation framework, hierarchically analyzed the data reflecting the user's mental health, and used different methods to fuse user psychological data to obtain a more comprehensive evaluation method. e system takes a long time; Soet [17] designed a college student mental health consulting service system based on the web, carefully analyzed user needs, and divided the system into different functional categories to build the entire mental health consulting service system to help users better provide psychological services, but the recall rate is not high. David et al. [18] selected the research object for the elderly and used different theoretical models for this group to conduct mental health tests, evaluate the advantages and disadvantages of different models, and integrate the advantages of various theoretical models. A mental health assessment method [19] is proposed, but the accuracy rate needs to be improved. Yang et al. [20] research object is also the elderly, and they analyzed the system requirements, designing an overall mental health assessment framework, and analyzed them one by one, but the system is efficient in operation room for improvement. With the continuous improvement of the ability to collect, analyze, and process data in the era of big data, it is no longer necessary to use paper questionnaires to sample data. People can accurately capture various data dynamics through computers combined with big data technology. In past research, schools mainly used paper questionnaires to measure students' mental health [21]. is method consumes manpower and material resources and is costly. Using big data technology, mental health assessment can be extended to a larger scope. Students use the internet to generate massive amounts of big data and use big data technology [22] to build a mental health evaluation system, which can fully grasp the psychological dynamics of students, combined with the establishment of mental health scales, and can more accurately predict the mental health of students.
e application of machine learning [23][24][25] in the field of mental health has gradually become an important development trend. Machine learning is a process of automatically analyzing or predicting new and unknown data by discovering the laws of a large amount of existing data and information. Tom gave a formal definition of machine learning: assume that performance is used to evaluate the performance of a computer program on a certain task if the program obtains performance improvement on the task by using experience. e currently available data types for machine learning are (1) text data [26]. Its sources include social media, clinical assessments and records, electronic health records, and diaries. In particular, social media can provide a large amount of data reflecting individual psychological and behavioral traces, which has higher ecological validity. At present, it mainly focuses on the prediction of suicide risk, the analysis of personality traits, the prediction of mental health status, and subjective well-being.
(2) Survey data [27] is commonly used is the combination of demographic variables, psychological scales, and publicly authoritative statistical data to predict and diagnose mental illnesses such as personality types and mental disorders. (3) Brain imaging data [28]: the detection of traditional brain imaging data relies too much on expert judgment, is timeconsuming and laborious, and has a high misdiagnosis or missed diagnosis rate. e accuracy of automatic classification using machine learning is better than expert judgment. Currently, it is commonly used in the prediction of cognition, memory, emotions, and addictive behaviors. (4) Behavioral and physiological data [29]: it mainly comes from eye-tracking, portable mobile devices, multichannel physiological recorders, and expression analysis systems. Machine learning algorithms are divided into three categories: (1) supervised learning [30,31]; (2) semisupervised learning [32]; (3) unsupervised learning [33]. Supervised learning is to obtain the relationship between data with features and labels through training to judge new data. Use labeled data to train the model. e model can predict the label of new data. For example, [34] the machine learning model can automatically identify based on the brain imaging data of traumatic brain injury patients and normal individuals to distinguish individual illnesses. Unsupervised learning is to discover the relationship between unfeatured and labeled data through training to judge new data. In many cases, labeled data sets may be difficult to obtain, so the model's predictive ability can be enhanced by using unlabeled data, but it should be understood that no algorithm can solve every problem well. Hence, it is necessary to compare multiple algorithms to determine which algorithm is most suitable for a particular data set and task to be solved. Commonly used machine learning algorithms include decision trees [35], naive Bayes [36], support vector machines [37], K-nearest neighbors [38], logistic regression [39], multilayer perceptions [39], random forests [40], and K-Means [41].
To improve the effectiveness of mental health intelligence evaluation, this paper proposes a mental health intelligent evaluation system based on data mining, which can effectively promote system information management, solve problems such as the old system resources, make perfect use of internet advantages, create a new internet environment, and effectively solve the psychological problems of contemporary people.

Related Work
e goal of psychology [42] is to describe, explain, predict, and control behavior. At present, most psychological research focuses on describing and explaining the relationship between variables. Research that can truly achieve the prediction goal is not common, and traditional psychological research [43] is due to the small sample size. e results of some studies are contradictory and uncertain due to low data quality and lack of covariate information. And, the significance test based on p value faces a crisis of duplication, that is, the irreproducibility of research results. Strict and systematic use of machine learning cross-validation technology [44] can provide great potential for realizing the reproducibility of psychological research. Technology-based machine learning can construct learning models from massive amounts of data, more accurately identify the underlying laws of the data, and have stronger generalization capabilities. e model can be applied to different samples or groups, thereby minimizing prediction errors to make more accurate results.
Many researchers try to use machine learning to predict complex psychological problems of individuals, such as predicting stress disorders and anxiety disorders. e research of Tian et al. [45] and Zhu et al. [46] found that only the effective features that can predict the possibility of suicide can reach thousands. ey use a multilayer perceptron algorithm to build a suicide recognizer for Weibo social media to evaluate the suicide of Weibo users in real time. As regards possibility, the prediction accuracy rate can reach about 94%. Carpenter [47] used preschool psychiatric assessment data and machine learning to assess the risk of anxiety in children aged 2 to 5 years, and their prediction accuracy for generalized anxiety disorder and separation anxiety disorder was as high as 90%.
Moreover, machine learning usually does not make assumptions or establish relationships between independent variables and dependent variables. It usually makes judgments among all possible independent variables, gives prediction results, and outputs important independent variables. Walsh [48] used the random forest model to predict future suicide behaviors of suicide attempted patients and normal subjects. e independent variables used include demographic variables, medical records, and socioeconomic status, and the accuracy of the predictive indicators can reach about 89%. Moreover, the long-term prediction cycle is the most important predictor for those who have attempted suicide in the history of hospital visits; in the short-term prediction cycle, dependence on addictive drugs is an important predictor. e type indicates that the activation degree of the amygdala and the hippocampus during the cognitive reassessment task and the amygdala gray matter volume have a good predictive ability for borderline personality disorder, and its accuracy and sensitivity can reach more than 70%. Such prediction results are correct. e development and selection of personalized mental interventions are crucial. In order to enable people to have an accurate understanding of their mental health, and at the same time to promote the scientific and informatization of mental health guidance, this paper constructs a mental health intelligent evaluation system based on a decision tree algorithm, using scientific mental health evaluation.
e tool provides a comprehensive and objective reflection of the user's mental health. e overall structure of the system is shown in Figure 1.

Entropy for Preprocessing.
e client software of this system uses the method of psychological evaluation data mining to realize the evaluation results, processes the obtained user evaluation result data and establishes a database, analyzes the evaluation results through the decision tree algorithm, and obtains evaluation. Psychological assessment data mining includes extracting data from the database, cleaning the data, selecting the mining model, and outputting the results. e complete data mining process is shown in Figure 2. In this paper, the system uses the user's answer results to establish an initial data set. After preprocessing data integration, data extraction, data cleansing, and data conversion, a mental health assessment data set to be mined is obtained. e decision tree algorithm is applied in the mining process. e data classification is obtained through the mining results, thereby obtaining the specific classification of the user's mental health status. e decision tree algorithm is an inductive learning algorithm, a classification rule obtained by inducing a set of chaotic examples based on examples. e decision tree mainly includes two steps when processing the classification problem. e first is to generate a decision tree classification model through the learning and training set; the second is to classify unknown types of samples through the model. In the classification process of a certain sample type, the root node is the starting point, a leaf node is an endpoint, and the sample attributes are gradually tested in the downward direction of the branch. In this paper, the C4.5 decision tree algorithm is applied, and the splitting index is the information gain rate, which solves the problem of biasing the selection of multiple attributes when using information gain to select test attributes. e definition formula of information gain rate is . (1) Gain(S, A) is the information gain of the attribute A and SplitInformation(S, A) is the breadth and uniformity of splitting the sample set S according to the attribute A. e decision tree C4.5 algorithm is used to classify the mental health intelligence assessment data to provide data support for the system in this paper. (2) In the process of decision tree generation, the most important thing is to determine the split target. In order to determine the splitting index in the C4.5 algorithm, it is necessary to compare the size of the attribute information gain rate in each training sample data and select the attribute with the largest information gain rate and higher than or equal to the average value of all attributes as a branch node of the decision tree for the existence of continuous descriptive attributes; the continuity needs to be divided to obtain a discrete set of intervals. Discretization methods include the following: ( According to the above discretization method, the information gain rate of all attributes in the candidate attribute set is solved, and the test attribute is the attribute with the largest information gain rate. Use all possible values in the sample set to divide the sample to obtain several subsample sets. Use the same method to continue to divide all the subsample sets. Until it cannot be divided, the decision tree is generated. When using the C4.5 algorithm to generate a decision tree, the selected target class affects the determination of each class. When evaluating the decision tree, information entropy needs to be used to obtain Information gain is the effective reduction of information entropy. According to information entropy, the variable level used for classification can be determined. If there are two classes, class A and class B, in the training set S, and they, respectively, include x and y records belonging to class A and class B, then the formula for determining the total amount of information for a certain record in the training set S is Suppose variable C is the root node of the decision tree, and the subclasses of training set S are divided into S 1 , S 2 , . . . , S k ; then S i (i � 1, 2, . . . , k) includes X i , Y i records that belong to class A and class B, respectively. Get the information volume formula classified in all subcategories:   Journal of Healthcare Engineering Suppose variable C is the classification node of the decision tree, and its information increment value is the largest among all variable information increment values. en the information increment formula of variable C is en the information gain function is defined as Info(S) � − P 1 * log P 1 + · · · + P k * log P k , 3.1.2. Improved Decision Tree. From the basic principle of the C4.5 algorithm, it can be known that the selection of attributes when generating a decision tree is based on the principle of information theory. Since the calculation of the information gain rate formula involves multiple logarithmic function operations, this requires that the library function is called multiple times during calculation, which greatly increases the calculation time. In response to this problem, an improved method for calculating the information gain rate is proposed; that is, the mathematical Taylor formula and McLaughlin's formula are used to simplify the calculation of the information gain rate of the C4.5 algorithm, which greatly reduces the calculation of the algorithm. e improved C4.5 algorithm is named TAM-C4.5 algorithm.
Since the derivative of ln(x) at x � 0 is meaningless, and the commonly used probability value range in the calculation formula of information gain rate is between [0, 1], this paper chooses the Maclaurin formula of ln(x + 1) which improves the calculation formula of information gain rate in traditional C4.5, as the formula When x ∈ (0, 1), the formula will become as follows: rough the above approximate simplification, it is possible to convert logarithmic operations into nonlogarithmic operations, and the abovementioned conversion characteristics can be used to eliminate complex logarithmic operations in the information gain rate formula, so as to simplify the calculation formula and improve the efficiency of tree construction. e conversion formula of category information entropy is as follows: Similarly, the conversion formula of conditional information entropy and split information entropy is as follows: erefore, the formula for calculating the information gain rate after conversion is as follows: Analyzing the improved calculation formula shows that the category information entropy is the same every time the information gain rate value of the condition attribute is calculated. Since 1/ln(2s) is omitted in each part of the simplified formula in this article, in order to ensure the classification accuracy of the algorithm, this article is calculating the category conditional entropy, the improved formula is used to try to ensure that the order of the information gain rate of each condition attribute is not changed, and the classification accuracy is not affected. e traditional C4.5 algorithm needs to call a function to perform a large number of logarithmic function operations. e improved algorithm proposed in this paper only needs a simple four-mixed operation, which eliminates the frequently called logarithmic operation in the information gain rate calculation formula, and the system operation speed is greatly improved.

ANN for Prediction. An ANN is a structure and calculation model that imitates a biological neural network.
It is usually used to estimate or approximate a function. e neural network is mainly composed of the input layer, hidden layer, and output layer. In practice, each neuron in the network's input layer represents a feature, and the number of neurons in the output layer is one. If the Softmax classifier as shown in Figure 3 is used, the number of output neurons is two, a multiclassification problem. e number of hidden layers and hidden layer neurons is manually set. Figure 4 shows a basic neural network model. e MP neuron model receives input signals from other neurons, and the connections are transmitted. en the total input received by the neuron is compared with the neuron's threshold, and after the activation function is processed, the neuron's output is generated.
Ideally, the form of the activation function should be a step function (that is, the modified linear unit ReLU). ere are two output types: 0 or 1. 0 means that the neuron is not excited, and 1 means that the neuron is excited. But the step function is not smooth and discontinuous, so the activation function usually chooses the sigmoid function.
It can be seen from the function image that the value range of the function is (0, 1). at is, the function value falls between 0 and 1. e property of the sigmoid function is that it can compress the input value that changes in a larger range into the (0, 1) interval, so it is also called a squashing function. By connecting many neural units according to a certain level, a neural network is obtained.

3.2.2.
e Cross-Entropy Cost Function to Optimize Backpropagation. When using this method, the real output y should be transformed by the Softmax function first and then calculated in the substitution cost function. e calculation formula of the softmax function and classification cross-entropy is as follows: When we improve the classification loss, we need to optimize the gradient of a step size through gradient descent. At this time, we require loss to give the partial derivative of each weight matrix and then apply the chain rule. en the first step in this process is to derive the softmax.
Calculate the error for the output y that the Softmax function has not transformed; that is, find the derivative of a. According to the chain rule, it can be transformed into an error function. e derivative of the output after the transformation of the softmax function is multiplied by the derivative of the output after the transformation of the SoftMax function, and the calculation formula for the derivative of a is as follows: Among them, the formula for the loss function to derive the softmax layer is k e a 1 k + e a 2 k + e a 3 k + · · · + e a n k , (15) where y is the expected output value of the k-th neuron and s K is the output of the k-th neuron transformed by the softmax function, namely, y out .

Improved ANN Algorithm.
Based on ANN's threelayer perceptron and nonlinear optimization capability, it can approximate any nonlinear function. However, ANN also has areas that need to be improved.
(1) ere are many times of training, and the convergence speed is relatively slow. e ANN algorithm also needs a lot of learning and training to converge for simple and common problems. For more complex problems, the training time will be longer.
Using the gradient descent method to minimize the loss function will inevitably cause jagged images, resulting in low algorithm efficiency. In order to better solve this problem and improve the convergence speed of the ANN network, the momentum term can be appropriately increased, the error function can be improved, the learning rate can be adaptively changed, and the steepness factor can be used.
(2) It is easy to fall into the local optimum, and the global optimum cannot be guaranteed. Because the backpropagation algorithm uses the gradient descent method, the weight space is a parabola with a minimum point, and there are multiple minimum points. erefore, different training starting points will result in no optimal solution. At present, there are many researches on ANN improvement. e more commonly used optimization algorithms are additional momentum algorithm, variable algorithm, adaptive learning rate method, RPROP method, conjugate gradient algorithm, Newton algorithm, Levenberg-Marquardt algorithm, etc. LM algorithm has the fastest convergence speed and the best robustness among the above algorithms.
In Newtonian algorithms, if Hessian matrix is not a positive definite matrix, then the Newtonian direction may point to a local optimum. is phenomenon can be solved by adding a positive definite matrix to Hessian matrix. LM algorithm is a combination of gradient descent method and Gauss-Newton method. It also has local convergence and global characteristics of gradient descent method. LM algorithm is also more stable and converges faster than gradient method. In ANN, the loss function ist e updated weight is calculated as follows: With Newton's algorithm, the amount of change is calculated as follows: Approximate them so that all Hessian matrices are invertible as follows: LM algorithm is able to improve the Gauss-Newton method, and the improved weight and threshold adjustment rule is calculated as follows: (20)

Data Set.
is experiment selects Student-Life and Reach Out online forum post data as the data source. e Student-Life data set is a data set researched by Dartmouth College. It records 49 students' psychological perception data for ten consecutive weeks, including academic data, online psychological test data, and questionnaire survey data. Reach Out online forum posts include user posting information, posting time, likes and views, etc. Select 5 million data points from each of the two data sets for a total of 10 million data sets. e data set is equally divided into 10 equal parts, of which six parts are used for neural network model training and four parts are used for experimental testing.

Evaluation Metric.
To show the superiority of our proposed method, four evaluation metrics are used to evaluate each algorithm, named MSE, MAE, RMSE, and MAPE.
e MAE of normalized data is given by the following: e MSE of prediction is as follows: e equation of MAPE is as follows: e RMSE is described as follows:

Comparison with Other
Methods. Select the following experimental indicators for verification and analysis. Feature fusion accuracy rate: the accuracy of feature fusion is directly related to the accuracy of the system evaluation results, so this indicator is selected for analysis. System recall rate: the recall rate generally refers to the recall rate, which refers to the ratio of the information obtained from the system query to the total amount of system information. e system recall rate is tested to verify the performance of the system in this article. Running time: time consumption is usually an important indicator of system performance. e comparison results of feature fusion accuracy are shown in Figure 5.
According to the comparison curve of the feature fusion accuracy rate of different systems in Figure 5, the system's accuracy rate in this paper is always the highest, with an average value of about 90%. Among the other three results, [16] has higher feature fusion accuracy, the highest rate can reach 80%, and the accuracy rate of [17] and [18] is low. It can be seen that this paper is based on a neural network for multifeature fusion, fully utilizes the ability of a neural network to process data in parallel, and obtains the feature fusion effect with higher accuracy. e higher the system recall rate, the higher the recall rate, and the superior system performance can be obtained. Table 1 and Figure 6 show the comparison results of the recall rate between the system in this paper and the system in [16], [17], and [18].
By analyzing the table data, it can be found that the system in this paper has a high data recall rate. Under different test data volume conditions, the recall rate is high, and it is much higher than the results of other documents. e system in this paper has certain advantages. By analyzing Figure 6, we can see that the data processing time of the system in this paper has been lower than the other two systems. As the amount of data increases, the data processing time of the three systems has changed. e data processing time of comparison system 1 and comparison system 2 has increased sharply with the increase in the amount of information, and the fluctuation range is large, and the stability is poor. e system's data in this article increase in processing time is small, the curve is smoother, and the stability is good. After the data volume reaches 5 × 10 GB, the data processing time gradually stabilizes, proving that the system in this paper has high data processing efficiency and strong stability. Table 2 shows the values of each metric obtained in 10 experiments for each algorithm. To further demonstrate the superiority of our proposed algorithm, the histogram is used to compare the average of ten experiments. It is not difficult to find that the error of our proposed algorithm is much smaller than the other two methods. In addition, the box plot shown in Figure 7 is used here to compare the robustness of various algorithms, and we use MSE as the comparison standard. e information entropy method combined with neural networks has far better immunity to extreme and outlier data than other methods.

Evaluation on Joint
Optimization. As mentioned before, the proposed joint optimization model consisted of the improved decision network and the improved ANN. To verify the effectiveness of this joint model, the comparative experiment based on the single improved model is conducted, and the result is shown in Figure 8. IDT is improved decision tree. IANN is improved artificial neural network.
It can be seen from the figure that the joint optimization model can obtain the best performance, with the lowest error index and the highest accuracy index. Its performance is better than any single improved model, which verifies the correctness and reliability of the joint optimization model proposed in this paper. At the same time, it can be seen that, compared to the IDT model, the performance of IANN is better, but it is still lower than the joint optimization model.
In addition, this paper uses the corresponding algorithm to improve the original decision tree and ANN. In order to verify the effectiveness of this improvement measure, a comparative experiment was also carried out, and the experimental results are shown in Figures 9 and 10. DT is original decision tree.
It can be seen that, compared with the initial decision tree algorithm and ANN algorithm, after introducing the information entropy improvement algorithm and the LM algorithm to improve the two, respectively, the performance can be effectively improved. e error index shows a downward trend, while the accuracy rate shows an upward trend, which proves the effectiveness and correctness of these two improvement measures. Literature [16] Literature [17] Literature [18]

Conclusions and Outlook
In order to improve the effectiveness of mental health intelligence evaluation, this paper constructs a mental health intelligent evaluation system based on the joint optimization model, which consists of the improved decision tree algorithm and the improved ANN. e proposed model can fully help users understand their mental health, solve their psychological problems, and enhance self-awareness. In order to make the system of this article more functional and have better development, it is also necessary to improve the analysis of the system evaluation results, improve the system data loss during a power failure, increase the data backup and recording functions of the system, and solve the problem of waste of storage space-more perfect the function of this text system. is article has conducted in-depth research on students' mental state perception based on network behavior data and has achieved certain results. However, there is still a lot of work that is not perfect and needs further discussion and research. e future research work will be from the following aspects: (1) the data source is not wide enough. is article only uses the network behavior data in the student's oncampus behavior data, which does not fully represent the student's behavior data. erefore, the next step is to focus on adding other on-campus network behavior data to the network behavior data. Behavioral data, such as library borrowing records and meal card consumption flow records, are used to construct a perceptual model of students' mental state.
(2) ere is not enough feature dimensions. e next step will be to explore new feature dimension construction methods. e currently used feature dimension data is only constructed with two indicators of regularity and dependence. In the future, more intermediate variables based on network behavior data will be explored. e existence continuously enriches the feature dimension of the sample data set. (3) e scope of model consideration is not large enough. e model used in this article is only two classification models in machine learning, and the current more popular deep learning model is not used. e next step will be to use the network structure model of deep learning to conduct model experiments and compare the performance of the authors' model.

Conflicts of Interest
e authors declare that they have no conflicts of interest.