Analysis of College Students’ Public Opinion Based on Machine Learning and Evolutionary Algorithm

. The recent information explosion may have many negative impacts on college students, such as distraction from learning and addiction to meaningless and fake news. To avoid these phenomena, it is necessary to verify the students’ state of mind and give them appropriate guidance. However, many peculiarities, including subject focused, multiaspect, and low consistency on different samples’ interests, bring great challenges while leveraging the mainstream opinion mining method. To solve this problem, this paper proposes a new way by using a questionnaire which covers most aspects of a student’s life to collect comprehensive information and feed the information into a neural network. With reliable prediction on students’ state of mind and awareness of feature importance, colleges can give students guidance associated with their own experience and make macroscopic policies more effective. A pipeline is proposed to relieve overfitting during the collected information training. First, the singular value decomposition is used in pretreatment of data set which includes outlier detection and dimension reduction. Then, the genetic algorithm is introduced in the training process to find the proper initial parameters of network, and in this way, it can prevent the network from falling into the local minimum. A method of calculating the importance of students’ features is also proposed. The experiment result shows that the new pipeline works well, and the predictor has high accuracy on predicting fresh samples. The design procedure and the prediction design will provide suggestions to deal with students’ state of mind and the college’s public opinion.


Introduction
Youth is the most important period for college students to establish a mature outlook on life and values.In college, students' perception on life and various things includes public opinion, which can also in uence the ideology of students in turn.e advent of Internet has increased the diversi cation of mass media, which makes it possible for people to obtain information that they are interested in at anytime and anywhere.However, the quality and reliability of information show increasing di erence.Some untrue and negative information might pollute public opinion in college and cause harmful in uence on students' state of mind.For personality, research studies have shown that students who are addicted to Internet and wireless mobile devices such as smartphones relate to increase in stress and anxiety while decrease in academic performance and satisfaction with life [1,2].ese impacts could make students take a pessimistic view and feel their lives meaningless which show strong relationship with depressive disorder and even suicide.For society, the spread of rumors could make students more suspicious and treat social media and government as liars instead [3].When students enter society after graduation, their distrust on government will leave room to disharmony.A student's state of mind is the cell of public opinion in college, and there have been strong evidences showing that students in positive environments are more likely to make great achievements [4].To protect students from the negative impact of information explosion, colleges should focus on giving guidance to students with problems in mind, take responsibility for helping them correct their outlook on life and values, and make them be willing to fight for the development of the whole human race.
However, students are usually not willing to seek guidance on state of mind because many of them do not want to be regarded as "sick." is requires the colleges to actively implement guidance on students.But if students tend to hide their problem, there will be problems for colleges to know who needs to be guided when facing thousands of students.One of the methods is using machine learning (ML) tools such as the neural network (NN) to predict students' state of mind.ML tools can automatically learn the function from students' features to their state of mind and make prediction quickly and accurately as long as there are enough training data.With precise prediction on students' state of mind, colleges can adjust the guidance according to the students' own features to enhance its effectiveness.
ML has been widely used to predict people's opinion on things by doing text analysis on data collected from Internet, but it might not be so much useful when predicting students' state of mind.at is because prediction of student's state of mind has several peculiarities: (1) focus on subject: this work is focused on the people who make judgements, but not the judgements they have made; (2) multiaspect: to enhance the correctness of the analysis, the predictor should learn plenty of information from different aspects, but students might not publish some of this information forwardly on the Internet; (3) low consistency on aspects: different students would like to pay attention to different matters, thus it is opinionated to make an answer on a certain question as a public criterion.To meet these peculiarities, more abundant data should be collected for a single sample which covers most aspects of opinions related to a student's daily life, and the data of different samples should have good consistency on their content.If only text-based data from the internet are collected, the data set will be not effective enough.On the contrary, the traditional method of using the questionnaire to get the data can better meet the requirements.e questionnaire used is well designed to cover most of the aspects about college students, and the questions with scale can help quantify students' sentiment on different issues.e way of using a questionnaire can also force students to answer the same question so that the data between different samples can have high consistency on aspects of content.
e ML tool used as predictor is the NN.For a predictor, one of the most important criteria is generalization performance, which means the prediction accuracy on fresh samples.However, the high dimension of samples will make itself too sparse to fill the sample space.In the training process of NN, the lack of samples can cause overfitting [5].An overfitting NN fits the training set well but has poor prediction accuracy on fresh samples.As a result, a new way is needed to solve this problem.is paper will introduce a way that uses singular value decomposition (SVD) to reduce the dimension directly and add a closed loop based on genetic algorithm (GA) on the training process to relieve overfitting.After obtaining a NN with good generalization performance, a method of calculating importance of each features is also proposed, which can help colleges combine macroscopic policies and microscopic guidance and strengthen the overall effectiveness.
Section 2 reviews the related work.Section 3 introduces the process of using SVD to pretreat the data set.Section 4 introduces the method of getting a predictor with good generalization performance, also the way of calculating features importance.Section 5 describes the details of the experiment and shows the results.Section 6 concludes our study and introduces future work.

Related Works
Early research studies on mining humans' opinions have been done.Pang et al. [6] collected the review data from IMDb and used different tools of machine learning such as naive Bayes classification, maximum entropy classification, and support vector machines to classify audiences' sentiment towards movies.Khan et al. [7] analyzed abundant text on Twitter that related to specific products and services and summarized the user's overall views of those objects to help the producers and servers improve their works.Zhan et al. [8] designed an algorithm that not only mined opinion from customs reviews but also automatically pointed out the salient topics from these opinions, which can make the analysis more targeted.Zhou et al. [9] did the research to transfer customs' reviews into answers of a questionnaire generated by the algorithm automatically and analyzed the collected data to point out what were the main points to improve user's experience.Not only there are research studies focusing on objects, but also several others that try to focus on people.For example, Kosinski et al. [10] used "Facebook Likes" to predict a range of highly sensitive personal attributes and get high accuracy on some classification problems.Baik et al. [11] used buying behaviors to predict people's score on four different personality traits and showed better precision when compared with previous studies.Besides the abovementioned research studies in different applications, some researchers also summarize the work in the whole field of public opinion mining.Pang and Lee [12] focused on improving the methods to address the new challenges raised by opinion mining.Tsytsarau and Palpanas [13] tried to give a definition on opinion mining to clarify what is the basic work that should be done to mine public opinion.Ravi and Ravi [14] divided research studies into different levels and summarized the characteristics of each levels.ese summaries provide researchers powerful tools to do opinion mining and give criteria to assess their work.
e method of using a questionnaire to collect data has been widely used in many situations when it is necessary to establish a person's comprehensive personality profile.Topp et al. [15] reviewed 213 relevant articles to check the utility of a questionnaire named the WHO-5 Well-Being Index and confirmed its validity both in depression screening and outcome measuring in clinical trials.Garfinkel et al. [16] used a questionnaire to measure interoceptive sensibility, which is an important dimension of one's interception.It could help explain cognitive, emotional, and clinical associations of interoceptive ability.Duckworth and Yeager [17] considered a self-report questionnaire is more efficient in studies of assessing internal psychological states like feelings of belonging when compared with other measures.
From previous research studies, it is clear that the method of using a questionnaire is good at collecting comprehensive data from a single person, and the data between different persons have high consistency on aspects.
e collected data can be a good training material for human-focused opinion mining to learn the inner connection between students' behaviors and their state of mind.In this paper, the combination of the two methods overcomes the peculiarities and can make precise prediction on students' state of mind.

Data Collection and Pretreatment
is section will introduce what is the source of the data about college students' state of mind and describe the pretreatment method on data, including outlier detection and dimension reduction.Both of them are based on SVD.

Data Source.
e data used in experiment come from a survey on students' state of mind that was conducted by Northwestern Polytechnical University in September, 2017.e students who had been surveyed were from different grades (including some masters and doctoral students).Under screening and checking, the total number of efficient sample data is 953.
e questionnaire consists of 30 questions, which are well designed to cover most aspects of students' daily life and their opinions.In terms of content, these questions can be divided as follows: (1) basic information: gender, grade, subject, and so on; (2) individual development: information of personal development since university entrance and future plan after graduation; (3) focus of attention: the focus of event happened recently; (4) mind identity: agreement on some policies and opinions; (5) school work evaluation: satisfaction with school work and direction of improvement.In terms of form, these questions can be divided into a single-choice question, multiple-choice question, scale question, and essay question.
Questions in different types need different primary pretreatments to get the original data set.Options in singlechoice questions and multiple-choice questions are extended to independent variables, and the variable values were decided according to whether the options are selected or not; the answers of scale questions can be directly added into the data set; most of the questionnaires were left blank on essay questions so that they are ignored.After primary pretreatment, the sample vector dimension is extended to 160 dimensions.One of the variables is selected as sample label, and the rest are features of students.e sample label is given according to the students' evaluation on their own state of mind: the label 1 is positive, which means they do not need to be guided; the label 0 means the students are not mature and need to be guided.

Meaning of SVD.
SVD can be considered as the generalization of eigen decomposition from square matrix to matrix in any size [18].In this case, the original data set is S ∈ R m×n , which means there is m samples in the data set and each sample has n features.After the SVD process, there will be orthogonal matrixes U ∈ R m×m and V ∈ R n×n that present S as follows: S � UΣV T . (1) matrix and 0 is the zero matrix.σ i is the singular values of S sorted in the descending order.If 0 is removed, the related vectors in U can be deleted so that An n dimension coordinate system can be established in the space of student samples whose axes relate to sample features, and every student samples can be represented by a point.
e coordinate of sample is s i , which is the row vectors of S. en, the process of SVD can be considered as a coordinate transformation within the sample space, and each column vector v i of V represents a base vector of the new coordinate system.e new base vectors can be given abstract meanings according to their relationship with original features.All the new base vectors are perpendicular to each other because V is an orthogonal matrix.Let  S � UΣ, so that  S � SV. (2) From ( 2), it can be found that each row vector  s i in  S represents the coordinate of a sample in the new coordinate system.Meanwhile, the singular values that relate to different base vectors represent the dispersion of samples on these directions.If the singular value is large, the samples' projections on its related base vector are widely distributed, which means there is abundant information stored.

Application of SVD in Outlier Detection.
As larger singular value related to base vector which has a scattered distribution, it can be known that the bias on the base vector with small singular value will contribute more to a sample's deviation.As a result, the bias on base vector with small singular value should be given a high weight when calculating the total deviation of a sample.Before calculating sample's deviation, the singulars need to be sorted in the descending order as σ 1 ≥ σ 2 ≥ • • • ≥ σ n .e calculation formula of weight is as follows: e bias of student sample i on new base vector v j can be represented by Z-score.e calculation formula of Z-score is as follows: Complexity where  s ij is the element of  S and μ j represents the mean of all elements in column vector  s j � ( s 1j ,  s 2j , . . .,  s nj ).e total deviation of the sample is calculated by the following equation: After calculating deviations of all samples, a self-adapting threshold will be set.If a sample's deviation goes beyond the threshold, it will be deleted as outliers to make the data set more credible.A training set with high reliability will improve the generalization performance of the predictor.

Application of SVD in Dimension
Reduction.It is found that larger singular value relates to more information, which means singular value can be used to help reduce the dimension of data set.e specific way to reduce dimension is to delete singulars with small values and its related vectors in U and V.
en, matrixes can be reconstructed as k is the number of reserved singulars, and formula (1) will be written as However, even some new base vectors with small singulars might have high correlation with label, which means they can help increase the classification accuracy of the predictor.To protect them, the correlation between a base vector and sample label should be added in criterion.e importance score of a base vector is calculated by the following equation: where c i is the correction between original features and label and v ij is the element of V, which represent the relationship of original features and new base vectors.e amount of information carried by a matrix can be measure by its Frobenius norm (F-norm).e F-norm of is calculated by the following equation: where singular value σ i is sorted by its score i in the descending order.After base vectors with smaller scores have been deleted, the amount of remaining information can be represented by the F-norm of S ′ .And the percentage of the information reserved can be calculated by the following equation: where k is the number of reserved base vectors.
e reduction on dimension of the sample space can prevent overfitting caused by sparsity of samples and strengthen the generalization performance of the predictor.Furthermore, because the noise carried by the data set is more likely to have smaller variance than the useful information, the dimension reduction can also weaken the impact of random noise on the data set.

Prediction on Students' State of Mind
is section will describe how the BP algorithm can be used in training NN for predicting students' state of mind.However, it is found using only BP algorithm will lead to overfitting, so a new algorithm which combines GA is proposed to relieve overfitting.After getting a NN with good generalization performance, a method of calculating importance of different features are also proposed.

BP-NN.
BP algorithm is a common algorithm in ML.So, a NN trained by BP algorithm is established to predict the student's state of mind at first.After dimension reduction, the data of student samples can be represented by  S ′ ∈ R m×k .Here, m is not the total number of student samples, but the sample number after deleting outliers from the data set, and k is the number of remained new features of each sample student.Also, it should be  S ′ � U ′ Σ ′ , but in fact,  S ′ is Zscored by (4) to fit the standard normal distribution on each features.is pretreatment will balance the learning rate of parameters in different nodes.
en, a data set D � (x 1 , y 1 ), (x 2 , y 2 ), . . ., (x m , y m )   is obtained, where x i �  s′ i is a row vector of  S ′ and y i is the label of the ith student sample.
e NN that is used to predict includes three layers.e input layer consists of k nodes for inputting the data vector x i .e output layer has only one node for outputting the prediction  y i of samples.e hidden layer's node number l is adjustable to fit the actual demand.I i , H h , and O, respectively, represent the ith input node, hth hidden node, and output node.e parameters of NN include connection weights ω ih between I i and H h , connection weights ] ih between H h and O, thresholds c h of H h , and threshold θ of O. e thresholds of nodes make NN become a nonlinear function, so that f(x) is used as its equivalent function, and the output of NN is e optimization goal of BP algorithm is usually the mean square error (MSE) between the output and label.e MSE can be calculated by the following equation: BP algorithm uses the strategy of adjusting parameters along the adverse direction of the gradient of E to decrease the error between prediction and real label.For example, the variation of ω ih for each training round can be calculated by the following equation: 4 Complexity where μ is the learning rate, which decides the speed of training.
Set the function between a student's features x i and his state of mind y i as F(x).e use of BP algorithm can help decrease the difference between f(x) and F(x) rapidly, so that the trained NN can be used as a predictor to make good prediction on student's state of mind.

Description and Analysis on
Overfitting.However, BP algorithm did not work well in the primary experiment.To test the usefulness of the predictor, the data set D was divided into training set D train and test set D test randomly.It can be found from Figure 1 that the variation of the MSE of the NN's prediction on D train and D test shows difference.
As the number of training round increases, the MSE of NN on the training set approaches 0, which means the predictor fits the training set well.However, the MSE on the test set is still at a large value.is indicates that a welltrained NN may not have high prediction accuracy on fresh samples.F1-measure (the harmonic mean of the recall and precision ratio) can represent the prediction accuracy of the predictor, and the mean F1-measure on the training set is 0.97, while the mean F1-measure on the test set is only 0.76.It means overfitting occurs.
Generally, the noise and unrelated features carried by the training set is considered as the reason of overfitting [19].As the function between students' features and state of mind is set as F(x), the influence on state of mind caused by noise and unrelated features can be defined as characteristic function N(x), then the function of student samples in the training set is F train (x) � F(x) + N train (x) and the function of student samples in the test set is All of the parameters in NN can be represented as p � (ω 11 , . . ., ω kj , ] 1 , . . ., ] l , c 1 , . . ., c l , θ).Let MSE function M(p) be the function from p to the MSE between prediction and labels, and it can be known that the parameters of optimal NN is the global minimum point p g of M(p).Due to the difference between F train (x) and F test (x), there will be difference between M train (p) and M test (p), thus difference between p g train and p g test .If a NN selects p g train as its optimal parameter, it will fit F train (x) well, but the accuracy of its prediction on the test set may be not good.
at is why overfitting occurs in BP-NN.In fact, the above method changes the criterion from only considering the MSE value to considering both the MSE values and the similarity with the test set.e optimization task of the MSE value can be handled by BP algorithm as usual.However, the similarity with the test set is difficult to quantify, but it can be indicated by F1-measure of prediction on the test set.

A Method of Relieving
To improve the similarity, an evolutionary algorithm is needed, so GA is introduced.It is known that different initial parameters of NN p i can make the network converge to different p l values when the training set is constant [20], and the training process can be represented as p l � T(p i ).So that p i can be regarded as an individual of population in GA, and population can be represented by P � (p i 1 , p i 2 , . . ., p i N ) where N is the population size.After the NN with p i has been trained, the fitness of the individual will be calculated as the F1-measure on the test set.After the operations of mate, mutate, and selection to generate, p i tends to fit both the training set and test set well.
However, using D test to calculate individuals' fitness means D test is also involved in the closed loop of algorithm, thus it loses the representation on fresh samples.To test whether the generalization performance of NN is improved, it is necessary to separate a set of samples before the algorithm to show the change of prediction accuracy on fresh samples. is sample set is called verification set D ver .
After the primary experiment, if D test which is used to calculate fitness is constant, the prediction accuracy of the test set will be improved greatly, but the prediction accuracy on verification set does not have a distinct change.is may be caused by the difference between F test (x) and F(x).In order to make the algorithm effective, D test is divided into Complexity three parts as temporary test sets D temp randomly, and a punishment is added if the network only performs well in one of the temporary test sets.en, the fitness of individuals will be calculated as follows: e F1 i in ( 12) is the F1-measure on the ith D temp , and F1 mean is the mean of F1 i .After modification of the algorithm, each individual faces different D temp values when calculating fitness. is method will dilute N test (x), so that the evolutionary direction is to fit F(x) rather than to fit F test (x).It can make sure that the NN is going to have better generalization performance.
It has been found that overfitting when predicting students' state of mind is caused by difference between F train (x) and F(x), so it will be of benefit to use the above algorithm.Just as the process represented in Figure 2, the whole data set is divided into D train , D test , and D ver at first.en, the initial P is generated randomly, and all of the NNs with p i will be trained by the same training set.To keep genetic advantage of the individual with high fitness, elite strategy is used when generating the next population.is strategy produces offsprings by mating and mutating before selecting individuals in the next population, and the offspring has the same size of current population which is N.All of the individuals in the current population and offsprings are sorted by their fitness in the descending order, and the first N individuals are selected as the next population.When the population reaches the largest generation, the individual that relates to the NN who has the largest prediction accuracy on D ver will be chosen as optimal solution p o .Finally, a new network whose initial parameter is p o will be trained by the whole data set to get the predictor that can predict students' state of mind with high accuracy.
4.4.Feature Importance Calculation.BP algorithm combined with GA helps obtain a predictor on students' state of mind with high accuracy, but the predictor is a "black box" which means the mechanism of making prediction is still unknown.
ough NN is known as an unilluminated method, there are still some ways to evaluate the importance of different features [21].After a NN is trained by D, it has nearly perfect prediction accuracy on D. But if one feature is sheltered (replace the mean of this feature.which is 0 after zscored) in each student sample, there would likely be a recession on accuracy [22].en, the importance of features in D can be calculated by the following equation: where F1 0 is the accuracy before sheltering and F1 i ′ is the accuracy after ith has been sheltered.e features in D are abstraction of original features, which means the importance of original features can also be calculated according to V ′ which represents the relationship between new features and original features by the following equation: e importance analysis on single NN lacks credibility, so the total importance of original features is accumulated after analyzing 500 NNs.e importance can help colleges know what is important in guiding students' state of mind.

Experiment Results and Discussion
is section will show the experiment results which can support the hypotheses that have been proposed above.ey can also show the actual effectiveness of this new method on predicting students' state of mind.

Implementation Details.
e first step of experiment is the pretreatment of data.e raw data collected are already numbered according to the order of answers or whether an answer has been ticked, and the data set expansion strategy described in Section 3.1 is used to make it regular.en, the whole data set is regard as a matrix, and SVD is performed.
e attained matrixes after SVD are used in the process of outlier detection and dimension reduction which are described in Sections 3.3 and 3.4.After pretreatment, each sample is represented by a vector in the new coordinate system and a label of state of mind.en, the samples are used to start the operation of the algorithm described in Section 4.3.e algorithm will train many NNs, and the NN with best generalization performance will be selected to make precise prediction on students' state of mind.e rest of the trained NNs also show part of inner connection between students' behaviors and their state of mind, so all of the trained NNs are used to calculate importance of different features by the method described in Section 4.4.

Experiment on Data Pretreatment.
e pretreatment of data set includes outlier detection and dimension reduction.Both of them have been detailed in Section 3.After calculating deviations of all samples, the distribution of deviations is represented in Figure 3.It can be found that the distribution of sample's deviation roughly conforms to the normal distribution.e normal distribution which is shown in Figure 3 is obtained by fitting the original distribution approximately.e mean of normal distribution is μ � 105.18, and the standard deviation is σ � 13.97.
en the self-adaption threshold is calculated by threshold � μ + 2σ � 133.12, and the number of outliers is 27.
After checking the content of deleted outliers, many of them are found with contradictions in context.For example, a student said his counselor is the one who gives the best help in development of his state of mind, but he also said he was   6 Complexity dissatisfied with counselors' work. is might be his real thought, but will confuse the predictor, so he is removed from the sample set.Other outliers can make mistakes while answering questions, such as making multiple choices on single-choice question.It will also change the potential importance of a variable so that they should be removed.e experiment result on outlier detection shows that the algorithm is effective and reasonable.e application of outlier detection will purify the data set and help improve the generalization performance of the predictor.
Figure 4 shows the change in the number of reserved inputs with the increase in percentage, which represents the percentage of reserved information calculated by F-norm.It can be found that the number of inputs have a sharp increase when the percentage is large.It confirms that there are many inputs related to small singular values and have a low correlation with label that should be deleted to reduce the dimension of sample space.In experiment, 90% information is reserved, and the number of inputs changes from 159 to 71.

Experiment Results on Overfitting Relief.
e overfitting relief experiment is designed according to the algorithm mentioned in Section 4. Abnormal samples have been previously excluded by the outlier detection based on SVD, and 926 samples are used.Under the principle of randomness, 70% samples are selected as D train , 20% samples are selected as D test , and the rest 10% are used as D ver .e number of individuals in a population is set to 10, and the maximum number of generations is set to 50.
e population crossover rate is 0.7, and the mutation rate is 0.3.Figures 5(a 6(a) that the prediction accuracy on D test does not reach the same increase shown in Figure 5(a).But on D ver , the prediction accuracy increases obviously.is tells the hypothesis in Section 4 is reliable.
In Figure 6(b), it can be observed that the mean F1measure of the initial population on the verification set is 0.7662.After the evolution, the mean F1-measure of the last generation reaches 0.8080.When applying the network as a predictor in real engineering, the one with best accuracy from the current population will be chosen.e data show that the biggest F1-measure is 0.8315, which represents high prediction accuracy.With reliable prediction of students' state of mind, colleges can make effective guidance to help students get rid of bad influences of information explosion.Complexity original features are partly shown in Table 1.e features are sorted by their absolute importance and only top 10 are listed in Table 1.It can be found some features positively relate to students' state of mind while others negatively.Also, some features show a potential causal relationship with state of mind such as "focus on politics news" and "benefit more from ideology education," while others may not such as "use QQ often" and "benefit more from club activities" though they do have correlation.It can also be noticed that the features classified as "individual development" have the best power in indicating students' state of mind.

Experiment on
With analysis on feature importance, colleges will be aware of what are the key points of guidance on students' state of mind.For example, they can help students interest in politics news, strengthen the relationship between students and their tutors, and enrich social practice.ese macroscopic policies can combine with the microscopic guidance on students' state of mind and enhance overall effectiveness.

Conclusion
e main purpose of this paper is to change the traditional method on mining humans' opinion to make it effective when predicting students' state of mind.
is changed method requires more data aspects of samples, and using a questionnaire is a good choice to get comprehensive data about students.However, the expansion of sample space's dimension makes samples sparser and causes overfitting while training NN.To solve this problem, SVD is used to reduce the dimension of sample space directly, and a closed loop based on GA is added to help NN have better prediction accuracy on fresh samples.e result of the experiment shows that the new algorithm works well and the predictor obtained has good generalization performance.Also a simple method of calculating features' importance is proposed, which can help colleges make policies.
e new method lets the predictor make reliable predictions on students' state of mind.With these predictions, Complexity colleges can apply guidance associated with students' personal experience which will make it more genial and effective.Furthermore, the macroscopic policies made according to feature importance can supplement microscopic guidance to have better effectiveness.
For further research, a questionnaire used to collect data will be redesigned.e aim of questions in the questionnaire should be more covert to make sure to collect real information, and the content of questions should be more various especially in "individual development" to collect more data that might be necessary.Also, the classification problem on students' state of mind will be changed into quantization problem to get a student's certain score on different aspects of state of mind.e method of calculating feature importance will be improved too.
ese future studies will be able to further strengthen the effect of guidance on students' state of mind.

Figure 1 :
Figure 1: MSE on different data set.

Figure 2 :
Figure 2: Structure of the algorithm.
) and 5(b) show the change on population's mean F1-measure on different data sets when using constant D test to calculate individuals' fitness, and Figures 6(a) and 6(b) show the changes when using D temp to calculate fitness.Figures 5(a) and 5(b) show the different tendency of population's mean F1-measure on D test and D ver .It can be found the prediction accuracy on D test has an obvious increase, but the prediction accuracy on D ver decreases slightly, which means population tends to fit F test (x) when using D test to calculate fitness.If D temp s are used, it can be found in Figure

Figure 5 :
Figure 5: Using constant test set to calculate fitness.(a) Test set.(b) Verification set.

Figure 6 :
Figure 6: Using temporary test set to calculate fitness.(a) Temporary test set.(b) Verification set.
M train (p) and M test (p) should be approximate.is can be verified by putting "prospect holes" on them.Putting "prospect holes" means to use same input p 0 to predict different sample sets and compare the difference between M train (p 0 ) and M test (p).After putting "prospect holes" randomly for 1000 times, the calculated mean percentage of MSE difference is 1.14%, which verifies the approximation.As a result, there can be several similar local minimum points p l in different M(p) values.If the NN with parameters of p g train does not fit the test set, then one of the p l train values which has high similarity with one of the p l test values can be used as the approximate optimal solution.Although the M train (p l train ) is slightly larger than M train (p g train ), it can fit both the training set and the test set well, which means to have good generalization performance.