Influence Model of Analyzing the Effect of Mental Health Level Based on Big Data Mining System

In order to explore the effect of big data mining system on analyzing mental health level, this paper proposes to influence model of analyzing the effect of mental health level based on big data mining system. Through continuous testing and analysis, the main symptom affecting students’ mental health is obsessive-compulsive disorder. Therefore, taking obsessive-compulsive disorder as the classification target to view the model, in this application, the factor of compulsion in students’ psychology occupies a relatively high proportion. Anxiety, interpersonal relationship, and paranoia have a great impact on goal attribute obsessive-compulsive disorder. The results showed that if the degree of anxiety (cid:31) medium, there was a tendency of obsessive-compulsive disorder regardless of the degree of interpersonal relationship. If anxiety level (cid:31) none, when paranoia level (cid:31) [mild, moderate], obsessive-compulsive symptoms level (cid:31) mild, and when paranoia level (cid:31) none, it is related to interpersonal relationship and hostility. If the degree of paranoia (cid:31) “severe” or “extremely severe”, the degree of obsessive-compulsive symptoms (cid:31) none. If the degree of anxiety (cid:31) light, there is a tendency of obsessive-compulsive disorder regardless of the degree of interpersonal relationship. If the degree of anxiety (cid:31) severe, the degree of obsessive-compulsive symptoms (cid:31) moderate. If the degree of depres- sion (cid:31) medium, the degree of anxiety (cid:31) medium. If the degree of depression (cid:31) none, when the degree of terror (cid:31) medium, the degree of anxiety (cid:31) light, and when the degree of terror (cid:31) [none, light, heavy], there is almost no anxiety. If the degree of depression (cid:31) mild and the degree of obsessive-compulsive symptoms (cid:31) none, there is no anxiety tendency. If the degree of depression (cid:31) severe, the degree of anxiety (cid:31) severe. If the degree of depression (cid:31) medium, the degree of interpersonal rela- tionship (cid:31) medium. If the degree of depression (cid:31) none, when the degree of terror (cid:31) light and there is psychosis, the degree of interpersonal relationship (cid:31) light. If the degree of depression (cid:31) mild and there is obsessive-compulsive disorder, there are problems in interpersonal relationship. The data analysis of mental health problems has been greatly improved, verifying the reliability of the application of data mining systems in mental health evaluation systems.


Introduction
Accompanied by many psychological problems, especially contemporary students [1,2], when freshmen enter the university campus, they will face a completely di erent living environment from the past [3]. It is di cult for many students to adapt to this new environment. Moreover, great changes have taken place in the interpersonal relationship around me [4]. I need to make new friends, leave my parents for the rst time, and deal with everything by myself. Freshmen will have many negative psychological emotions, such as depression, anxiety, loneliness, etc., resulting in no interest in learning, unwillingness to communicate with others, etc. [5]. At present, universities have to conduct psychological investigation on students when freshmen enter the university, which has also accumulated a large amount of psychological data. However, how to use these psychological measurement data to get more meaningful results, so as to better carry out psychological education? Data mining technology includes many algorithms: cluster analysis (or unsupervised learning), association rule mining, prediction, time series mining, and deviation analysis. We will select appropriate algorithms according to the characteristics of psychological data to achieve the expected data mining objectives, as shown in Figure 1, data analysis, and mining technology [6]. With the large-scale enrollment expansion of universities, the psychological problems of students are more prominent. Psychological education has been paid attention to, and the research on psychological problems is imminent. Among the psychological problems of students, anxiety and depression have become important risk factors affecting students' physical and mental health. Only by their healthy growth and talent can we ensure that the cause of socialism with Chinese characteristics has successors and prosperity.

Literature Review
To solve this research problem, Tang et al. proposed tan algorithm (seminaive Bayesian algorithm). Tan relaxed the assumption of conditional independence between attributes and made the results of naive Bayesian algorithm into a tree structure, allowing each node to rely on at most one node other than the parent node [7]. Sp-tan (seminaive Bayesian algorithm) proposed by Li et al. is another tan algorithm. Sptan adopts greedy heuristic search algorithm. When selecting each edge, it will select the edge that will improve the accuracy of the whole classifier the most [8]. Tabbakha and Razavi combine inert learning with tan, which also weakens the hypothesis of conditional independence [9]. Jie et al. made a corresponding comparison between LBR (rule extraction algorithm) and tan and proposed lazy tan (seminaive Bayesian algorithm) in combination with their characteristics. e quality of a classifier cannot be fully evaluated by the accuracy of classification [10]. Liu et al. used AUC as the classification degree to add parent nodes [11]. Usui et al. combined boosting with tan and proposed a higher performance classification algorithm [12]. Alharbi and Shahrjerdi proposed another two-layer improved algorithm, which divides the attribute set into strong attribute set and weak attribute set. Any two attributes in the strong attribute set have dependencies, and the weak attribute sets are conditionally independent. e result of Bayesian network is a probability graph model, which also has a good structure and does not require conditional independence between attributes [13]. Kumar et al. applied Bayesian network to the analysis of primary liver cirrhosis and tested its hypothesis with a confidence of 95% [14]. Sharma et al. used Bayesian belief criterion to establish Bayesian network model in incomplete data. Someone proposed smart BN, which can be effectively used to predict human actions in video and can dynamically change the number of nodes and the relationship between nodes [15]. Lukyanov et al. assume that all points of a given cluster obey the same probability distribution, and the objects in the data set are determined according to the maximum probability value in the distribution. Hierarchical clustering, also known as agglomerative clustering, is also a very organized clustering technology. Hierarchical clustering is a method based on greedy algorithm. Each time, it calculates the similarity of data points, then selects the closest elements to form a class, and then inserts them into the original data set. e end condition of iteration is that there is only one point in the data set [16].
rough continuous test and analysis, it is found that the main symptom affecting students' mental health is obsessive-compulsive disorder. Viewing the model with obsessive-compulsive disorder as the classification target, we can understand that anxiety disorder and interpersonal relationship also play a great role. Set the target attributes as anxiety and interpersonal relationship and the output variables as the remaining 9 factor variables to mine the main causes of obsessive-compulsive disorder, so as to provide reference for staff guiding mental health.

Main Process of Data
Mining. After years of exploration and research, people have summarized the basic process of data mining technology [17,18]. It includes cleaning, extracting, and transforming the required data from the initial data that has not been cleaned up, generating a data set, establishing a classification or clustering model on this, and finally extracting and analyzing the information [19] (the specific process is shown in Figure 2).

Decision Tree.
Decision tree is a very classic classification algorithm, which has good classification effect, and its result model has good interpretation function. Decision tree is a tree data structure composed of decision nodes and decision leaves. A leaf node can determine the category of an instance, and the function of the node is to determine how to select the next node in the test case by comparing the attribute values. For discrete attribute a, there are h possible values from A � d 1 , . . . , A � d n . For continuous attributes, each node has a country value. You can judge which branch should be selected by comparing with the threshold. In fact, the classification process of the decision tree is a process of moving the instance from the tree root to the leaf node. e class marks owned by all leaf nodes of the instance are the class marks owned by the instance. At present, the commonly used decision tree algorithms include ID3, C4.5, cart, etc. (as shown in Table 1). e construction algorithm of decision tree is similar. It is a construction method based on greedy thought. e division of nodes is obtained by calculating the information moisture, but the algorithm adopts different information moisture calculation methods [20].  e construction of decision tree is recursively realized by top-down greedy algorithm. In each internal node, select the test attribute with the best classification effect to classify the training sample set, and recursively call the process to construct the following subbranches until all attributes are used or all training samples belong to the same category. If the data instance and the node type in the decision tree are the same, it will be classified into the same class. If the two are different, the instance is placed as a new node in the corresponding decision tree. Repeatedly, the decision tree containing only one root node can be extended to a complete decision tree [21], as shown in Figure 3.

Data Acquisition.
e data used in this study comes from the SCL-90 psychological data of first-year students in a university. ere are 1643 people in this test, 989 girls and 654 boys [22]. e data mining process of students' mental health evaluation is shown in Figure 4.

Data Preprocessing.
Data preprocessing is an important link in the process of data mining. Data mining usually deals with data containing a lot of noise, fuzzy data, redundant data, or incomplete data. In the mental health evaluation data of students, incomplete data and invalid data are caused by students' carelessness or other reasons, which will lead to a lot of inaccurate noise data. Due to the existence of these worthless data, it will eventually affect the accuracy of mining analysis results. rough data preprocessing, the level of mining can be greatly improved and the time spent in analysis can be reduced [23][24][25].

Initial data
Select data Pre-processed data Converted data e information being extracted Assimilated knowledge  ID3 algorithm e core of ID3 algorithm is to use the information gain method as the selection criteria of attributes at each level of the decision tree to help determine the appropriate attributes to be used when generating each node.
C4. 5 algorithm e selection of node attributes is determined by information gain rate, which is a derivation and improvement of ID3 algorithm. ID3 decision tree algorithm is usually suitable for discrete description attributes, while C45 decision tree algorithm can deal with continuous attributes and discrete description attributes.

Cart algorithm
It is a very useful nonparametric classification and regression method. Usually, the construction and generation of binary tree have three processes: Construction tree, pruning tree, and evaluation tree. When the end point is a classification variable, the tree is a classification tree. When the end point is a persistent variable, the tree is a regression tree.

Data Selection.
Data selection is a common data processing method for data analysis and mining in the early stage. It is the first step of data preprocessing. Due to the large scale of the original data set, mining and analyzing all data sets cost a lot of operation resources and operation cycle, so it is necessary to select data from the data set to reduce the impact on the results [26]. According to the mining project objectives, collecting and finding the information records in the data set can not only simplify the data content, but also find the internal relations between attributes and the laws hidden behind the data. Delete the useless information of the student basic information table, including student ID, ID number, name, date of birth, native place, telephone number, and other attributes. is information will only affect the efficiency of mining calculation. For the attribute of students' nationality, because the students in school are mainly Han and there are few other nationalities, the deletion of nationalities has no impact on the mining results.
Delete the useless information in the "student mental health evaluation form," including the student number, gender, department, major, and other attributes of students, with the selection score of 90 questions in the psychological evaluation symptom checklist SCL-90, and retain 10 psychological dimension factors as the analysis content of data mining [27].
Finally, the data fields associated with the mining task are determined by deleting the useless attribute information in the above two tables. e data set required for the student basic information table is composed of gender (XB), household registration (HK), only child (DSZN), and family status (JTZZ). e data required by the student mental health evaluation form are composed of obsessive-compulsive symptoms (QPZZ), depression (YY), somatization (QTH), hostility (DD), anxiety (JL), interpersonal sensitivity (RJGX), psychosis (JSBX), phobia (KB), paranoia (PZ), and others (QT).

Data Cleaning.
e main purpose of this operation is to eliminate redundancy, errors, and noise in data. Data cleaning is mainly to filter and remove duplicate data, supplement and improve incomplete data, and correct or delete wrong data. Duplicate data is mainly information with the same attribute value, incomplete data is mainly missing due information, and wrong data is mainly information written directly to the database without judgment.

Data Integration.
Data integration is the process of integrating records from multiple related data sets into a new one to the mining target content. e data used in the paper mainly comes from the student basic information table and SCL-90 mental health evaluation table. e two tables are connected through the associated field XH (student number), and a new student mental health evaluation table is generated from the data set determined in the "data selection" process, as shown in Tables 2 and 3.

Data Specification.
Data specification is a crucial link in data mining processing [28][29][30]. In data processing, we must first convert the data into a data form in line with data mining. e conversion principle usually uses continuous data discretization and discrete data classification. In this paper, the data standard operation is carried out for the information of "student SCL-90 mental health evaluation form." e main processes are as follows.
Data discretization: the continuous data discretization of mental health test scale is helpful to data mining operation. According to each item in the symptom checklist SCL-90, grade 1 to 5 scores were taken, and 10 factors reflected psychological symptoms. If any of these factors scores more than 2 points, the screening can be regarded as positive. erefore, the 10 factor scores of psychological symptoms are divided into two intervals: symptomatic and asymptomatic, of which more than 2 points are symptomatic and less than 2 points are asymptomatic.
Data categorization: there are many attribute values of student household registration and family economic situation. Classification conversion is required before data mining. Finally, the household registration is divided into rural (HK1) and urban (HK2), and the family economic situation is divided into difficult families (JT1) and nondifficult families (JT2). xb1 and XB2 are used to represent men and women in gender; BX1 represents the department of nursing, bx2 represents the department of pharmacy, BX3 represents the department of medical technology, bx4 represents the department of clinical medicine, and bx5 represents the Ministry of Public Affairs; DS1 and DS2 are used to indicate whether they are only children.
All attributes in the student mental health evaluation form have passed. e codes after the above principles and specifications are shown in Tables 4 and 5.  e data table of each attribute in the student mental  health evaluation table after data standardization is shown in  Tables 6 and 7. 3.6. Constructing Decision Tree 3.6.1. Basic Strategy of Decision Tree Induction. Firstly, the splitting criterion of the algorithm is used to find an attribute as the splitting attribute of the training sample set. en recursively call the above method for the subset on each branch to establish the branch on the node. With the growth of the tree, the training sample set is recursively divided into smaller and smaller subsets until all subsets contain only samples of the same category; that is, they reach the leaf node. Finally, a decision tree classification model similar to flowchart is generated.

Information Gain.
Let node n store all samples of data division D. e expected information required for sample classification in D is given by the following formula: where p i is the probability that any sample in D belongs to C i , which is calculated by |C i,D |/|D|. In fact, the above formula is only the proportion of the number of samples of each class in the total number of samples. Info(D) is also called the entropy of D. Entropy is a statistic used to measure the degree of chaos of a system. Suppose that the samples in D are divided according to attribute a, and attribute a has V different values a 1 , a 2 , . . . , a v . If the value of attribute a is a discrete value, attribute a can divide d into v subset D 1 , D 2 , . . . , D v , where the value of the sample in D j (j � 1, 2, 3 . . . v) on attribute A is a j . ese subsets correspond to each branch growing from node N. e expected information required for the sample classification of D based on attribute a can be obtained from the following formula: where |D j |/|D| is the weight of the subset whose value is a j on attribute A. Info A (D) is the expected information required to classify D samples based on attribute A. Knowing the value of attribute a leads to the reduction of entropy, which can be obtained from the following formula: 3.6.4. Gain Rate. Based on the splitting of attribute XH (student number), because everyone's student number is different, there will be as many divisions as the number of student number attribute values, and these divisions are pure, and each division has only one data record. According to formula (2), the expected information required for the division of D samples according to XH (student number) can be obtained:    DD  KB  PZ  JSBX  QT  XB2  HK2  DS1  JT2  BX1  QTH2  QP2  RJ2  YY2  JL2  DD2  KB2  PZ2  JS2  QT2  XB2  HK2  DS1  JT2  BX1  QTH2  QP1  RJ2  YY2  JL2  DD2  KB2  PZ2  JS2  QT2  XB2  HK1  DS2  JT1  BX1  QTH2  QP1  RJ1  YY1  JL1  DD1  KB2  PZ1  JS2  QT2  XB2  HK1  DS2  JT2  BX1  QTH2  QP1  RJ1  YY1  JL1  DD1  KB2  PZ2  JS2  QT2  XB2  HK1  DS2  JT2  BX1  QTH2  QP1  RJ2  YY2  JL2  DD2  KB2  PZ2  JS2  QT2  XB2  HK2  DS1  JT2  BX1  QTH2  QP1  RJ1  YY1  JL1  DD1  KB1  PZ1  JS1  QT2  XB2  HK1  DS2  JT2  BX1  QTH2  QP2  RJ2  YY2  JL2  DD2  KB2  PZ2  JS2  QT2   6 Scientific Programming According to formula (3), the information gain of this attribute is the largest and will be preferentially regarded as the splitting attribute. However, for classification, it is meaningless to divide based on student number. e basic principle of C4.5 is the same as that of ID3. e difference is that C4.5 uses the gain rate instead of the information gain as the attribute selection measure (splitting rule) to make up for the disadvantage that ID3 prefers attributes with more selection values when using the information gain to select attributes. e information gain rate is defined as follows: .
Split information is used in the above formula to normalize the information gain. Split information is similar to info (D), which is defined as SplitInfo A (D) represents the information generated by dividing the training sample set D into v plans corresponding to V outputs of attribute A test.

Construct the Decision Tree of Students' Psychological
Problems. Steps 1 and 2 are performed recursively on each of the split sub-data sets. e class label attribute Mg (interpersonal sensitivity) has two different values: 1 (symptomatic) and 0 (asymptomatic). erefore, the training sample set has two different categories. We first calculate the expected information of training sample set D classification as follows: Next, the expected information of each split attribute needs to be calculated. Taking XB (gender) as an example, attribute XB has two different values: xb0 (male) and xb1 (female). erefore, according to the value of attribute XB, Finally, according to formula (5), gain rate of attribute is as follows: , Using the same method, the information gain rates of attributes ZY (  e sample is divided into two subsets according to whether it is only child or not. Repeat the above steps to classify the sub-data set of each branch, and then export the branch again. With the increase and extension of branches, the sample data set is recursively divided into smaller subdata sets.  BX  QTH  QPZZ  RJGX  YY  JL  DD  KB  PZ  JSBX  QT  XB2  HK1  DS2  JT2  BX2  QTH1  QP1  RJ2  YY1  JL2  DD2  KB1  PZ2  JS2  QT2  XB2  HK1  DS2  JT2  BX2  QTH2  QP2  RJ2  YY2  JL2  DD2  KB2  PZ2  JS2  QT2  XB2  HK1  DS2  JT2  BX3  QTH2  QP2  RJ2  YY2  JL2  DD2  KB2  PZ2  JS2  QT2  XB2  HK2  DS1  JT2  BX3  QTH1  QP1  RJ1  YY2  JL1  DD1  KB2  PZ1  JS1  QT1  XB2  HK1  DS2  JT1  BX4  QTH1  QP1  RJ1  YY1  JL1  DD1  KB1  PZ1  JS1  QT1  XB2  HK2  DS1  JT2  BX5  QTH2  QP2  RJ2  YY2  JL2  DD2  KB2  PZ2  JS2  QT2  XB2  HK1  DS2  JT1  BX6  QTH2  QP2  RJ2  YY2  JL2  DD2  KB2  PZ2  JS2  QT2 rough continuous testing and analysis, the main symptom affecting students' mental health is obsessivecompulsive disorder. erefore, when viewing the model with obsessive-compulsive disorder as the classification target, the results shown in Figure 5 can be obtained, according to C4.5 algorithm principle. It can be seen from Figure 2 that anxiety disorder and interpersonal relationship also play a great role.
Set the target attribute as anxiety level and interpersonal relationship level, respectively, set the output variable as the remaining 9 factor variables, and execute the data flow. e results are shown in Figures 6 and 7, respectively.
Dig out the main causes of OCD, as shown in Figure 8.

Analysis.
From various angles, on the whole, the psychological quality of students is healthy. In this application, the factor of compulsion in students' psychology occupies a relatively high proportion. Anxiety, interpersonal relationship, and paranoia have a great impact on goal attribute obsessive-compulsive disorder. It can be seen from Figure 2 that if the anxiety level � medium, there is a tendency of obsessive-compulsive disorder regardless of the degree of interpersonal relationship. If anxiety level � none, when paranoia level � [mild, moderate], obsessive-compulsive symptom level � light; when paranoia level � none, it is related to interpersonal relationship and hostility; if paranoia level � "heavy" and "extremely heavy", obsessive-compulsive symptom level � none. If the degree of anxiety � light, there is a tendency of obsessive-compulsive disorder regardless of the degree of interpersonal relationship. If the degree of anxiety � severe, the degree of obsessive-compulsive symptoms � moderate.
As can be seen from Figure 3, if the degree of depression � medium, the degree of anxiety � medium. If the degree of depression � none, when the degree of terror � medium, the degree of anxiety � light, and when the degree of terror � [none, light, heavy], there is almost no anxiety. If the degree of depression � mild and the degree of obsessive-compulsive symptoms � none, there is no anxiety tendency. If the degree of depression � severe, the degree of anxiety � severe.
As can be seen from Figure 4, if the degree of depression � medium, the degree of interpersonal relationship � medium. If the degree of depression � none, when the degree of terror � light and there is psychosis, the degree of interpersonal relationship � light. If the degree of depression � mild and there is obsessive-compulsive disorder, there are problems in interpersonal relationship.
As can be seen from Figure 5, in the known mining results, it is found that the causes of students' psychological obsessive-compulsive disorder are mainly distributed in family atmosphere, family structure, and origin. Children from healthy families are full of hope for life and have great confidence in their emotional life. Due to the lack of parental  Scientific Programming care, lack of sense of security, nerve sensitivity, and emotional vulnerability, students who have both parents died are always timid in doing things, and their psychological problems are very significant. Unsound families with single parents or divorced parents will always do harm to their children's mental health in varying degrees and levels.

Conclusion
is paper proposes to influence model of analyzing the effect of mental health level based on big data mining system. rough continuous test and analysis, it is found that the main symptom affecting students' mental health is obsessive-compulsive disorder. Viewing the model with obsessive-compulsive disorder as the classification target, we can understand that anxiety disorder and interpersonal relationship also play a great role. Set the target attributes as anxiety and interpersonal relationship and the output variables as the remaining 9 factor variables to mine the main causes of obsessive-compulsive disorder, so as to provide reference for staff guiding mental health. Future association rule algorithms can be used to analyze students' attribute data, with more intensive research.
Data Availability e data that support the findings of this study are available from the author upon reasonable request.

Conflicts of Interest
e author declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.