Broad Learning-Based Optimization and Prediction of Questionnaire Survey : Application to Mind Status of College Students

The mind status of college students is important since it can reflect how the public opinion is going. Only with the accurate prediction, the corresponding actions can be conducted to prevent the situation from going worse. This paper focused on the data analysis using the recent developed broad learning method to obtain the learning model and then the prediction can be done. Firstly, the questionnaire related to the ideological state is designed. Secondly, the data are collected and classified using the typical questions and answers. Thirdly, for each pair of the question and the answer, the score is obtained and considered as data training of the system. Fourthly, the input and the output are selected according to the key questions and conclusions. Finally, the broad learning using flat network is employed for data analysis without deep structure. Tests show that the design using broad learning can efficiently deal with the regression problem and the learning network can be used for prediction.


Introduction
Social phenomenon is complex and challenging since the potential dynamics cannot be known exactly or cannot be obtained.For example, it is noted that civic education is a basic education subject that all countries around the world attach high importance to.Civic education in the UK mainly includes moral education, political education, and religious education, while in the USA, it is implied in patriotic education and moral education.In Japan, civic education is called "social science," while it is called "life education" and "state consciousness education" in Singapore, and called "ideological and political education" in China.Due to the difference of the countries in state nature, political system, cultural tradition, education system, etc., their civic education differs quite a lot in form, content, and method.However, the purpose of civil education in whatever culture or system is the same: the ruling class tries to have purposeful and planned impact on the citizens by applying a certain political ideas and moral norms, so that the citizens form social moral practice in line with the needs of the ruling class.From the implementation effect of the education of all countries, we can see that the civic education of each country presents specific goals according to the national needs and by integrating the characteristics of the times.Therefore, "political, ideological, and advancing with times" are the characteristics of the civic education of each country.
Due to the significance of civic education, it is important to know how the status is.Furthermore, with analysis, it is key point to find some relation between factors and the results, while in this way, some actions can be taken before the consequence comes.Using questionnaire to obtain the quality of the status is a useful way.The questionnaire should be designed with specific aim so that the results can reflect a certain intention.However, the data is not easy to analyze.If only the mean value and the ratio are calculated, only the status of interviewers can be reflected while the deep information cannot be digged out for test and the design cannot be used for prediction.For comprehensive analysis, the goal for questionnaire survey is not restricted to the statistics.The possible target is to model the public opinion while the prediction is available and then the possible action can be taken.With the abovementioned consideration, the study using machine learning is of great interest.In literature, many methods have been analyzed using the bioinspired technique such as genetic algorithm (GA) [1], neural networks (NNs) [2,3], fuzzy logic system (FLS) [4], particle swarm optimization (PSO) [5,6], extreme learning machine (ELM) [7][8][9], deep learning [10], and broad learning [11].The methods are mimicking the behavior of biology.For example, the GA is using initialization, selection, and variants.ELM is using the feedforward NN, and the basic idea is using the way of random assignment for generalization.Deep learning is using the learning data representations.Recently, the broad learning is gaining attention.The method is using random vector functional link NN, and the key point is to expand the network by randomly adding the functions instead of increasing the depth.In [12], the feature nodes are replaced with Takagi-Sugeno fuzzy system and the fuzzy broad learning system is proposed.
For data analysis and system control, NN is widely studied in [13,14].For data analysis, in [15], the complex sale forecasting problem is studied due to the requirement of reliable prediction where the fuzzy NN is used while the initial weights are generated by genetic algorithm.In [16], the fuzzy NN learning is proposed while the continuous genetic algorithm is used to improve the performance.One case of global supply chain management is studied with data collection, construction of learning method, and decision integration model.In [17], the study is on importance-performance analysis while the relationship is nonlinear and multicollinearity where the backpropagation NN is used to train the model.In [18], similar work has been done while the case study is with gathering data of customer perception about focal delivered service, fuzzy set design, and attributes' implicity.In [19], the market segmentation is studied using self-organizing feature maps NN and genetic K-means algorithm.In the test, the procedure includes questionnaire design, importance analysis, satisfaction analysis, performance evaluation of the satisfaction, and the discussion.In [20], the restaurant service recovery is studied to see how much the consumer will return with the recovery plan.In the design, support vector machine and multilayer perceptions are used to predict the consumer expectations.In [21], the learner's preference for the visual complexity on small screens of mobile computers is studied by NN.Using RBF-NN model, the accuracy can be guaranteed while the investigation efforts can be avoided.In [22], the fruit fly optimization algorithm is adopted to optimize the NN model while the principal component regression is used for the questionnaire survey.
From the abovementioned discussion, the questionnaire survey is widely used in many applications since it includes the principal component while the data indicates the potential nonlinear model.Using questionnaire survey is an effective way to analyze the status of mind.The challenge is on how to design the questionnaire survey and how to analyze the data.Many methods are adopted for data analysis and optimization where the basic idea is using NN while some other evolutionary algorithms are used to make the NN more efficient.However, the structure and the parameters should be selected step by step.Deep learning is widely studied recently, and the great breakthrough has been achieved.But the design is timeconsuming.Broad learning is interesting with randomly adding new functions while updating the results using pseudoinverse method.In this way, though the flat work might grow fast, the computation is not increasing so fast.
In this paper, the work is on questionnaire surveybased data analysis and prediction.The mind status of college students is collected while the inputs and the output for the training network are selected.Furthermore, the BL algorithm is employed for optimization.The obtained model is verified via test data.The organization of the paper is given as follows.In Section 2, the questionnaire survey is discussed.In Section 3, the broad learning is described and the data optimization is given.In Section 4, the conclusions and future works are discussed.

Questionnaire Survey and Analysis
This survey takes some college and university students in Shaanxi Province as the objects determine the total sample size required for the survey according to the conservative formula for calculating the sample size and the average expected sample size and samples randomly by the level.A total of 1236 questionnaires are distributed, 1202 effective questionnaires are recovered, and the effective recovery rate is 97%.The questionnaires do a comprehensive survey to the ideological and political quality of the college and university students, including their outlook in life and values, dynamic focus on social events, and degree of ideological and political identity.Statistical analysis is conducted after the questionnaire data is input into the database through the SPSS software.
The three dimensions of the ideological and political quality of the college and university students are interconnected and complementary.Therefore, when evaluating the ideological and political quality level of the college and university students, we need to take the three dimensions as the correlation coefficients with the same importance (the three dimensions, respectively, account for 33.3%), take 2.5 scores (totally 5 scores) as the total score for the evaluation standard through the expert consultation method according to the confidence interval and actual needs, work out the general mean distribution map, and determine poor ideological and political quality index (2.5-3) of the college and university students, quite poor ideological and political quality index (3-3.5),average ideological and political quality index (3.5-4),quite 2 Complexity high ideological and political quality index (4-4.5),and high ideological and political quality index (4.5-5),so as to judge the overall ideological and political quality level of the college and university students, wherein the ideological and political quality level of 64.8% college and university students is [4,5].Therefore, we can see that the current ideological and political quality level of the college and university students is quite high and at the middle and upper levels.
In the analysis, it is interesting to see that college students are more interested in career-related topics such as "New joint cooperation with Jingdong Company" (ratio 30.12%) "Accommodation Condition Improvement with Air-conditioner" (ratio 49.33%) while high technologyrelated news which can exhibit the university's reputation is gaining attention such as "The Unmanned Aerial Vehicle Show in National Day" (ratio 45.01%).Oppositely, they pay little attention to the speech on politics (ratio 15.47%) or the results of Outstanding Young Investigator Award of National Natural Science Foundation (ratio 21.13%).

Data Representation and Broad Learning
In the questions, 34 items are included for the data regression.In the question, different answers are labeled as "1-5."Furthermore, the gender is considered while the grade is used as input.Male is labeled as "0" while female is labeled as "1."For different grades, different number "1-6" is used as a label.In this way, the input dimension is 36.The status is considered as regression output, while in the questionnaire survey, the option is 4 levels between "positive" and "negative."Considering the diversity, 404 samples are used for data analysis and learning.
For the broad learning, it is proposed in [11], the main idea is using pseudoinverse to update the learning weight since the new weights W n+1 are obtained form the previous weights W n using new enhancement nodes with the following equation: Furthermore, the sparse features can be included to solve the optimization problem.
The learning scheme is given as follows.Consider that the training sample is X 304×36 while the output is Y 304×1 .For j = 1, j ≤ m, the parameters are randomly generated as W h j and β h j .Furthermore, the new function is calculated as follows: The enhancement node group is written as The critical step is to calculate the pseudoinverse as follows: Furthermore, the weights are calculated as follows: If the training error threshold is not satisfied, the new generation of W h j+1 , β h j+1 , H m+1 , and A m+1 will be randomly given.The process will continue until the training error threshold is satisfied.
Remark 1.The idea is interesting since the adding of nodes is easy while the calculation will not take up too much time.In this way, the broad learning can be applied in an efficient way.
In the test, the performance using broad learning is given.Two kinds of learning are shown as training accuracy and testing accuracy.For the all 404 samples, if "P" samples are used for test, the rest "404 − P" samples will be employed for model training.
In Figures 1-3, the cases using 20 test samples are studied.The results show that with the nodes adding, the training accuracy is increasing while the testing accuracy is changing.But in all, the trend of the accuracy is good and the difference is due to the characteristics of randomness of the algorithm.
In Figures 4-6, the cases using 30 test samples are studied.The training accuracy increases with nodes adding while the testing accuracy is changing.This is dependent on the quality of the test samples.It can be concluded that the quality is important for data analysis while the broad learning can efficiently fulfill the requirement of regression.
In Figures 7-9, the cases using 40 test samples are studied.The training accuracy increases with nodes adding while the testing accuracy is stable.
Through the 3 sets of tests, it can be known that the broad learning-based design can efficiently build the relation between the input signals and the output information, though the model is nonlinear and unknown.During the process, the random vector functional link neural network is generated using the function and the network can grow with more features while the computation efficiency remains very high.For the prediction, due to the randomness of the samples, the accuracy is also highly dependent on the data quality.

Conclusions and Discussions
In this paper, the broad learning is given as data analysis and optimization.The questionnaire survey data is firstly analyzed and then the label is given for the inputs and the output.Using broad learning, the design shows the performance of the learning algorithm.It is also shown that data quality is important for accuracy.
For future work, the pruning of the nodes is interesting.Also due to the randomness, the performance is not so stable.Maybe some statistics can apply on the randomness, and the results can be more robust.More complex questionnaire        7 Complexity can be designed to collect more information, and the learning scheme is expected to obtain some critical features for mental test.