Data Mining Algorithm for College Students’ Mental Health Questionnaire Based on Semisupervised Deep Learning Method

In recent years, there are many cases of college students who have psychological problems aecting their studies, dropping out of school, and even committing suicide. College students, as a part of high-level talents, have always been regarded as outstanding members of society. ey default to have strong psychological qualities, but the reality is disappointing. Various pressures such as academics, social relations, and employment make college students the mental exhaustion has led to many bloody tragedies. e timely detection of psychologically abnormal students has become one of the most concerned and thorny issues in major universities. By constructing a mental health state perception model for college students and optimizing the model parameters, it can be seen that thef score of the internal and external tendencymodel has increased by 3.3%, thef score of the depression binary model has increased by 2.5%, and the anxiety binary model of the f score has increased by 2.5%. e score increased by 8%. e established model has an obvious eect and can quickly analyze the dierence between the behaviors of psychologically abnormal students and normal students in school and also provide a management decision-making basis for college student managers and psychological counselors.


Introduction
e frequent dropout and suicide incidents in recent years have proved that many students have more or less psychological problems, and these psychological problems have aroused widespread concern in society. At the same time, at all stages of life, people may have psychological problems due to various complex reasons. Early detection of these psychological problems can play a huge role in protecting personal safety. Because of the rapid development of disciplines such as computer science and mathematics, technologies such as deep learning, data mining, and big data are also making rapid progress and are increasingly integrated into people's daily lives. More and more algorithms are emerging in the eld of computer science to solve certain problems and in many ways outperform traditional methods. For example, traditionally, people use indicators such as degree centrality and edge betweenness to analyze social networks, which can better describe some characteristics of social networks but usually only express part of the information in social networks and may contain more noise, and network representation learning based on deep learning solves this problem very well, which usually captures more information in the network for more detailed analysis and identi cation. is research uses network science, deep learning, network representation learning, other technologies, and students' social network data to identify students who are more likely to have psychological problems. Accurate identi cation can give families and schools the opportunity to intervene in students as early as possible and prescribe the right medicine to solve students' problems. Helping psychologists provide psychological counseling can largely avoid the deterioration of students' psychological problems; reduce school dropout, suicide, and other incidents; and reduce social tragedies. At the same time, a new way to e ciently utilize multiview network data is proposed; more potential social information between views is preserved; and the problem of label imbalance that is common in such data is proposed. It is proposed to identify students' psychological problems through deep learning technology, which also provides a direction for the identification and research of students' psychological problems in the future [1][2][3][4][5][6][7].

Related Work
In recent years, many experts and scholars have realized the importance of educational big data. ey have used data mining technology to analyze and mine the data generated by students in school and have achieved remarkable results. ey have put the data mining results into education and teaching. A person's psychological characteristics are often expressed through daily life behaviors and routines. erefore, recently, researchers have begun to try to dig out information that can reflect their mental health status from the daily behavior data of college students. Several studies have shown a strong relationship between mental health status and online behavior. DongNie et al. explored the relationship between search behavior and personality traits and further attempted to determine how search behavior can be used to identify personality. ey collected two data sets: one from a questionnaire on 16 personality factors and the other from web access logs from Internet gateways. By calculating the correlation coefficient between individuals' search behavior and personality, some interesting patterns were found; there are several specific behaviors that have a strong correlation with personality, such as directory index search, knowledge search, dwell time, keyword usage, click habits, and so on. rough regression analysis, most personality dimensions can be predicted by search engine behavior. AngLi et al. propose an algorithm for predicting mental health problems through network usage behavior. ey recruited 102 college students and used the SCL-90 questionnaire to conduct a psychological survey. rough the results of the questionnaire, they obtained the mental health level of college students (10 dimensions) and conducted statistical analysis on the online behavior of college students. Based on web usage behavior, a computational model for predicting the scores of each dimension of SCL-90 is established. e results show that the fluctuation range of the Pearson correlation coefficient between the predicted score and the actual score of each dimension is between 0.49 and 0.65, and the fluctuation range of the relative absolute error is between 75% and 89%. Zangane and Hariri et al. explore the role of emotional factors in doctoral students' online information retrieval. eir study sample, 50 PhD students, aggregated information by observing user facial expression records, Morae software log files, and pre-and postsearch questionnaires. e findings suggest that there is a significant relationship between emotional expression and the individual characteristics of searchers. Searcher satisfaction with search results, Internet search frequency, search experience, interest in search tasks, and familiarity with similar searches were associated with increased happiness. An examination of user emotions during search shows that users with happy emotions spend a lot of time searching for and viewing search solutions. ChangyeZhu and BaobinLi et al. proposed a new method to detect depression through time-frequency analysis of network behavior. ey recruited 728 graduate students, obtained their depression scores through the Zung self-rating depression scale (SDS), and then collected digital records of their online behavior.
rough time-frequency analysis, they built a classification model to distinguish between high and low SDS groups, and a predictive model that more accurately identified the mental state of the depressed group. e experimental results show that both the classification model and the prediction model can better reflect the change in mental health, and the timefrequency feature can better reflect the change in mental health. e research at this stage is mainly limited to the behavior analysis of a certain type of specific group and the psychological analysis and interpretation of the influencing factors. However, the continuous update and iteration of data mining technology have provided a great boost to the research work in psychology. Driven by data mining technology, the construction of university informatization platforms has become more perfect. e use of mathematical models in machine learning makes it possible to predict the psychological state of students. Based on the main research contents of the above research scholars, the analysis of college education data has always been a research hotspot in the field of data mining. rough the data mining of students' in-school education, the researchers use data mining technology and analysis theory to draw conclusions about students' academic performance, social relations, and poor students and provide data support and data support for the construction of college informatization and smart campuses. e theoretical basis also provides valuable guidance and suggestions for college administrators [8][9][10][11][12][13][14].

Data Processing Technology
3.1.1. Data Filtering. Although data mining is a method for massive data analysis, it does not mean that we need to use all the collected data because, when collecting data in the early stage, we did not specifically consider the use of these data in the future. However, the value density of these data is very low, which is not conducive to later data analysis. erefore, when there are specific research goals, only the data that are useful for target analysis need to be selected. For the research purpose of this study, the original data are the records of students' online behavior in school collected through the network system. e key information in the records are the student ID, the website visited, the type of website visited, the time of visiting the website, and so on. e rest of the information cannot be used, such as some parameters and codes designed by the log system to ensure security, which are useless for our analysis.

Data
Cleaning. Data cleaning is mainly to solve the problem of poor data quality caused by some accidental factors for some useful data after data screening. e main means are as follows: Missing value processing: when the data set with missing data accounts for a relatively low proportion of the whole data set and the data volume of the sample is relatively large, in this case, it can be processed through the deletion method, that is, the data items with missing values can be directly discarded. Another common processing method is the filling method. is method is used to fill in the data according to the average value of the data near the dimension where the missing value is located when the data volume itself is not particularly large and there are many missing samples.
Outlier processing: it is also called outlier processing or error value processing. If the value of one dimension in a data item is far greater than or less than the value of other data items in the sample, the data item with the outlier is called an outlier. In the case of outlier data, it cannot be discarded directly. It is necessary to analyze it to judge whether it is reasonable and then decide on the processing strategy. For example, in real life, the age range of people is greater than 0 and less than 150. When the value in the data item exceeds this range, it can be regarded as an abnormal value. is method is a simple analysis method.
When the data is considered to have the same meaning as the addition and subtraction of several dimensions, they can represent the same meaning. Eliminating duplicate data dimensions plays a certain role in data downsizing and model burden reduction. It ensures the uniqueness and representativeness of data dimensions.
Noise data processing: the random error or variance of the measured data caused by some reasons is the so-called noise, which is the interference to the data. e commonly used methods are the box division method and regression method. e box division method forms a small group of nearby ordered values, namely "box," and then smoothes these ordered data values with the mean, median, or boundary of the data in the box to make these data locally smooth. e regression rule is to use a regression function to fit these noisy data and play the role of smoothing data denoising [15].

Data Conversion.
Data conversion processing can also be called data mapping processing, which generally has three cases. One is the encoding conversion of text data. Since the computer cannot directly process text data, such as calculating the distance between two data, it is necessary to numerically encode the text. For example, in the gender type, the male is coded as 1, and the female is coded as 0; common encoding methods include one-hot encoding and so on. e second is format conversion. For example, date data need to be converted into a unified format type to facilitate subsequent analysis and processing. e third is the mathematical processing of numerical data. For example, when it is found that the numerical value of a certain dimension encountered changes in the form of an exponential, the exponentially changing data can be quickly converted into decimal numerical data, which is convenient for observation and analysis through the following formula [16]: In the same way, when the data changes in the form of a power function, it can be processed by the method of opening the n th power, and it can be converted into small numerical data that is easier to observe by formula (2), where y is the converted value and x is the value obtained before conversion.
3.1.4. Data Integration. e student's on-campus network behavior data are recorded in the log system and saved in log format, while some basic information data of the student are saved in the student management system. During data analysis and model training, it is cumbersome and errorprone to operate the data in each system, so a data integration method is needed to extract these data and save them in the same environment for processing.

Data Analysis Technology.
e biggest difference between logistic regression analysis and linear regression analysis is that the data types of the variables Y analyzed are different. Logistic regression analysis can analyze discrete categorical data, while linear regression analysis can only analyze continuous data types [17].

Logistic Regression Distribution
Definition 1 (Logistics). Let x be a continuous random variable, when x has the following distribution function and density function; then x is said to obey the logistic regression distribution.
e distribution function and density function plots are shown in Figure 1.

Binary Logistic Regression Analysis.
Binomial logistic regression is suitable for dealing with the situation when the value of the dependent variable y has only two categories, which is essentially a prediction model of classification probability. e value of the variable x is a real number. When the value of the dependent variable y is 1, the probability model is shown in equations 4 and 5, and when it is 0, it is shown in the following equations: where x∈R n is the input value of the model, y∈0,1 is the output value of the model, ω∈R n is the parameter, ω is called the weight vector, b is called the bias, and ·x is the inner Scientific Programming product of ω and x. Assuming a given input vector x = (x 1 , x 2 , x 3 , · · ·, x n ), the logistic regression analysis model can separately calculate p (y = 0|x n ) and p (y = 1|x n ) for each dimension x of the input vector value. e concept of event probability and logarithm is introduced here. Let P be the probability of occurrence of an event, and 1 -P corresponds to the probability that it does not occur. e probability of an event can be expressed by the ratio between the two, that is, P/(1 -P); on this basis, the logarithmic probability of the event can be obtained by taking the logarithm. e formula is shown in (7). e odds can be given by the following equation [18]: It can be seen from formula (7) that when the value of the dependent variable y is 1, the calculation formula of the logarithmic probability of logistic regression analysis is actually a linear function. e logistic regression analysis is essentially tting this linear function so that this linear function can distinguish the two categories of the original data as much as possible. e larger the value of the linear function, the greater the log probability of the logistic regression, the closer the classi cation type is to class 1, and vice versa; it is closer to class 0.

Classi cation Algorithm.
e essence of a classi cation algorithm is to train a classi er on a labeled data set so that it can divide a new data set. e process of evaluating the results of the classi cation algorithm is inseparable from the existence of a confusion matrix, which is a computing tool often used in classi cation algorithms. Figure 2 shows the composition of the confusion matrix [19]. en the basic elements of classi cation problems are as follows: training data, that is, the sample data set used to learn the model; feature, that is, the attribute used to describe data and the basis of classi cation; model, that is, the external framework of classi er; algorithm, that is, the method of constructing classi cation rules; and evaluation, that is, the nal evaluation of the e ect of the model. e purpose of the classication algorithm is to mine the hidden rules in label data, so as to divide the data set in feature dimension space.
ere are four parameters in the confusion matrix, which are as follows: TP (true positive): true examples, which refer to the positive tuples correctly classi ed by the classi er TN (true negative): true negative, which refers to the negative tuples correctly classi ed by the classi er FP (false positive): false positive, which refers to negative tuples that are misclassi ed as positive tuples by the classi er FN (false negative): false negative, which refers to the evaluation index of the positive tuple classi cation model that is misclassi ed as a negative tuple by the classi er can be calculated by the four parameters of the above confusion matrix: Accuracy rate: also known as overall recognition rate, it generally measures how well the classi cation model can correctly identify various data sets: Precision: it re ects the proportion of correct classication in the classi cation results of the classi cation model for each category, that is, the accuracy of the model when judging each category.
precision TP (TP + FP) . Recall rate: also known as sensitivity, it can re ect the sensitivity of the classi cation model to each category of data set, that is, the proportion of a certain type of data that can be correctly identi ed by the model.
F score: since the above two indicators are negatively correlated, in order to measure the comprehensive performance of the model on the two indicators, their harmonic mean is used as a new indicator, and the value of this indicator ranges from 0 to 1. e larger the value, the better the e ect of the model.

Integration Algorithm.
ere is no perfect algorithm model in the eld of data mining. e idea of an integrated algorithm is to combine di erent types of algorithm models through a certain strategy, so as to improve the overall model's ability to classify data sets.
ere are two main strategies for ensemble learning algorithms: bagging and boosting. e main idea of the bagging strategy is to combine the results of each base classi er and then determine the nal classi cation result of the overall model by voting, which can e ectively increase the stability of the classi cation. When training the base classi er, a part of the data of the sample is extracted by the method of bootstrap to construct the training data set of the base classi er.
is kind of "incomplete learning" is to reduce the di erence between each base classi er.

Feature Construction of a Mental Health State
Perception Model Based on Deep Learning

Data Preprocessing.
e network log data mainly comes from a dedicated network log collection server. rough the user's application to access the network, the link data accessed by the user is collected, so as to obtain the user's network log information. e main content of the log information is: "A record of a user accessing a certain network type at a certain point in time." A sample of log information is shown in Table 1 [20].
Compared with the website type, the attribute of the website name is too careful and narrow, and it obviously belongs to the category of shopping. ere is no need to distinguish the two. erefore, when extracting feature dimensions from log information, the website name is not a required item. For the point-in-time information, in order to facilitate the processing of data at a point in time, it is divided into two parts: the part of year-month-day is used as "date," and the part of hour:minute: second is used as "time." e nally extracted feature dimensions are student ID, gender, age, date, time, and website type; the format is shown in Table 2 [21].

Static Variable Analysis Based on Binary Logistic
Regression. As shown in Table 2, there are two static variables in this research, gender variable and age variable. Static variables are property variables that are basically unchanged or unchanged for a long time. Firstly, the univariate binary logistic regression analysis was performed on the gender variable and the age variable. e dependent variable was divided into three groups. e extroverted "1" and the introverted "0" were the one group, and the depression "1" and the asymptomatic "0" were the one group.
Next, a multivariate binary logistic regression analysis was performed with the combination of gender and age as the dependent variable, and the results are shown in Table 3.
It can be seen from the results in Table 3 that the results of logistic regression are the same as those of univariate analysis. Combining the results of the two analyses, it can be concluded that the gender and age factors are not statistically signi cant for the psychological state indicators and can be ignored.

Feature Construction Based on Information Entropy.
is study proposes two concepts for the design and construction of features. ese two concepts are regularity  of surfing behavior and degree of dependence of surfing behavior. e regularity of surfing behavior is a measure of the regularity of students' surfing behavior of visiting different types of web pages within a period of time. e design of this regularity is based on the concept of Shannon's information entropy. According to the information entropy, the order and purity of a data set can be measured. erefore, combined with the theory of information entropy, a method for calculating the regularity of students' online behavior is designed. Method. For example, for online shopping behavior, let Shopping Regularity be SR, set shopping times to different intervals, and the shopping frequency interval is [0, 5], [6,11], [12,25], [26, . . .), according to the number of times of shopping per day to determine which interval it belongs to, and finally we will get the frequency distribution corresponding to these n intervals as C = {C1, C2, ..., Cn}, and the probability corresponding to each interval is p, and the calculation formula is as follows: en the calculation formula of shopping regularity SR is e method of interval probability is also used to calculate the degree of dependence on online behavior. As shown above, the number of visits of a certain type of Internet access in the day is divided into intervals, and then the frequency of occurrence of this type in which interval is calculated in the statistical period. If the interval is the highest, then it is determined that the interval is dependent on this type of network behavior. If the interval is a loworder partition, the degree of dependence is light, and if it is a high-order partition, the degree of dependence is higher.
is research transforms mild dependence into number "1," moderate dependence into number "2," and high dependence as "3." When defining the interval division of the degree of dependence, two cases are considered. When accessing data of social platform type, such as when processing microblog data, it is divided into two types: browsing microblogs and publishing microblogs. e interval can be divided into [0, 30] times as a low degree of dependence interval, [31, 60] times as a medium degree of dependence interval, and more than 60 times as a high degree of dependence interval. e dependence degree interval for publishing microblogs can be divided into [0, 10] times as a low dependence degree interval, [10,19] times as a medium dependence degree interval, and more than 20 times as a high dependence degree interval. e feature dimensions of the final constructed sample data set are shown in Table 4.

Feature Selection Based on Genetic Algorithm.
After removing the uncorrelated static variables, the regularity and degree of dependence of various network types and the existence of the student ID are left in the sample data set. For each pair of labels, not all types of network behavior data are helpful for model training, and redundant data participating in the training will reduce the accuracy of the model. erefore, this study uses the adaptive iterative ability of the genetic algorithm to perform feature selection and uses it to perform feature selection according to different label states. e iterative graph of the feature dimension obtained by the genetic algorithm is as follows: the horizontal axis is the number of feature combinations, and the vertical axis is the fitness function score, as shown in Figure 3.
When based on the internal and external trend tags, the extracted feature dimensions are 8, and the effect is the best.  Based on the label of depression or not, the e ect is best when the extracted feature dimensions are 5. ese feature dimensions are shopping dependence, listening to music, WeChat dependence, map website regularity, and game dependence. Based on the anxiety label, the extracted feature dimension is 5, and the e ect is the best. ese feature dimensions are short video website dependence degree, information website dependence degree, music listening regularity, game regularity, and shopping regularity.

Experimental Environment.
e experimental environment and related parameters are shown in Table 5.

Data Preparation.
e data used in this experiment comes from two parts, which are divided into two parts: label data and feature dimension data. e processing of these two parts of data is as follows: label data processing: when labeling the psychological state information of internal and external tendency, the data with external tendency score are classi ed as label "1," and the data with internal tendency score are classi ed as label "0." For the pair of labels with or without depressive symptoms, the score 4 is taken as the threshold. e data less than the threshold are classi ed as the label with no depressive symptoms "0," and the data   greater than the threshold are classi ed as the label with depressive symptoms "1." Similarly, for the treatment of the pair of labels with or without anxiety symptoms, the score 4 is also used as the threshold. e data less than the threshold are classi ed as the label without anxiety symptoms "0," and the data greater than the threshold are classi ed as the label with anxiety symptoms "1." Examples of speci c sample data sets are shown in Table 6.

Comparison and Analysis of Model Experiment
Results. e optimal model of the three mental state models is the random forest model, which does have a strong role in the eld of classi cation. e parameter adjustment and optimization of the random forest model is mainly carried out through the grid search method. e parameter adjustment of the random forest mainly involves the following parameters: (1) n_estimators: the maximum number of iterations or the number of weak learners. If n_estimators is too small, it is easy to under t, and if n_estimators is too large, over tting will occur, so a suitable value for n_estimators is very important. (2) min_samples_split: the minimum number of samples required for internal node subdivision 3, min_samples_leaf: the minimum number of samples for leaf nodes 4, and max_depth: the maximum depth of the decision tree. Taking the GA-RF model of the two-category depression as an example, the process of parameter adjustment is as follows: in the rst step, take n_estimators as the variable; the initial value is 10, and the interval is 10; and the output result is shown in Figure 4.
In the second step, (min_samples_split, max_depth) is used as the parameter combination; the starting value of min_samples_split is set to 100, and each change is 200; and the starting value of max_depth is set to 3, and each change is 2, and the output result is shown in Figure 5.
In the third step, take (min_samples_split, min_sam-ples_leaf ) as the parameter combination; the starting value of min_samples_split is set to 20, and each change is 10; and the starting value of min_samples_leaf is set to 60, and the output results are shown in Figure 6.
After comprehensively considering the above three steps, the optimal parameter combination of the depression model is obtained: n_estimators 50, min_samples_split 150, max_depth 7, and min_samples_leaf 25. Follow this step for the optimal parameters of the other two models and bring these parameters into the model. e f value of the model is shown in Table 7.
It can be seen from Table 7 that after the optimization of the model parameters, the f score of the model has been improved. For the internal and external tendency model, the f score increased from 0.765 to 0.79, an increase of 3.3%. For the depression binary model, the score of F increased from the original 0.81 to 0.83, an increase of 2.5%. For the anxiety  binary classification model, the score of f increased from the original 0.75 to 0.81, an increase of 8%.

Conclusion
Nowadays, the topic of students' mental health has attracted more and more attention from society. For example, the incidents of college students committing crimes and committing suicide caused by the abnormal psychology of college students have also often caused heated discussions in public opinion. At present, most of the students have insufficient understanding of mental illness or even have an attitude of neglecting and not paying attention, so these students with mental abnormalities cannot be found and treated effectively in time. ese students and conducting interventions are a top priority in student management efforts. With the development of data mining technology, the construction of the data analysis model has been solved for us in terms of model analysis. is paper is based on the research on the psychological state prediction model that is used to capture students' psychological state information based on the students' online data collected on the university campus and the psychological assessment scale indicators. is research is based on the deep learning theory. e model is constructed, analyzed, and adjusted, aiming to grasp the psychological state information of students more comprehensively and accurately through the network behavior data of students in school. e results of the model experiments show that the f scores of the three models have improved. e models used in this study are only classification models in the field of machine learning, and the more popular deep learning models are not used. e next step will use deep learning. e network structure model of the aspect is used for model experiments to compare the operation of the two models.
Data Availability e data set can be accessed upon request.

Conflicts of Interest
e author declares that there are no conflicts of interest.