^{1}

^{2}

^{2}

^{3}

^{3}

^{1}

^{2}

^{3}

Under the background of “innovation and entrepreneurship,” how to scientifically and rationally choose employment or independent entrepreneurship according to their own comprehensive situation is of great significance to the planning and development of their own career and the social adaptation of university personnel training. This study aims to develop an adaptive support vector machine framework, called RF-CSCA-SVM, for predicting college students' entrepreneurial intention in advance; that is, students choose to start a business or find a job after graduation. RF-CSCA-SVM combines random forest (RF), support vector machine (SVM), sine cosine algorithm (SCA), and chaotic local search. In this framework, RF is used to select the most important factors; SVM is employed to establish the relationship model between the factors and the students’ decision to choose to start their own business or look for jobs. SCA is used to tune the optimal parameters for SVM. Additionally, chaotic local search is utilized to enhance the search capability of SCA. A total of 300 students were collected to develop the predictive model. To validate the developed method, other four meta-heuristic based SVM methods were used for comparison in terms of classification accuracy, Matthews Correlation Coefficients (MCC), sensitivity, and specificity. The experimental results demonstrate that the proposed method can be regarded as a promising success with the excellent predictive performance. Promisingly, the established adaptive SVM framework might serve as a new candidate of powerful tools for entrepreneurial intention prediction.

In recent years, with the enlargement of college enrollment in China, the number of college graduates is increasing every year and the employment situation of the society is becoming more and tenser. In order to solve the problem of employment, the government and universities are trying all kinds of solutions. For example, it is an important way to alleviate the employment problem of college students at present by encouraging some college students to carry out their own entrepreneurial activities and implementing the employment strategy of “employment driven by entrepreneurship.” However, statistics show that the proportion of college students starting a business after graduation is still low. Less than 5% of the graduates will choose to start their own businesses. At the same time, the satisfaction rate of entrepreneurship after three years of graduation is not high. Although many students have received relevant training in entrepreneurship and innovation education at school, there will still be confusion in the choice of entrepreneurship and employment when they graduate. Therefore, it is necessary to make a deep analysis of the rational choice behavior of college students’ employment and entrepreneurship, in order to find an effective way for students to choose the direction of graduation scientifically. At present, a large number of datasets have been accumulated in colleges and universities. We can make deep mining and analysis of these data to establish an intelligent prediction model by which we can find out the factors that affect college students after graduation to choose entrepreneurship or employment in the data. And then we can further analyze the potential correlation between factors to guide students to better choose entrepreneurship or employment.

Up to now, data mining technology has been applied to establish models to analyze the issues related to the employment of college graduates. Decision tree (DT) is one of the commonly used models which are involved. Zang et al. [

To sum up, there is a lot of work on predicting students’ employment ability, but there is no report on predicting college students’ entrepreneurial employment after graduation. This paper attempts to use a new machine learning framework to predict students' entrepreneurial intention after graduation, that is, to choose to start a business or not. The proposed framework consists of three main parts. The first part uses random forest (RF) method to select the key features in the data, the second part uses chaotic local search based sine cosine algorithm (CSCA) to optimize an optimal SVM model, and the third part uses CSCA-SVM obtained from the previous stage to predict the new sample. Sine cosine algorithm (SCA) is a new swarm intelligence method that was proposed recently by Mirjalili [

The rest of this paper is structured as follows. Section

In SVM model, Gaussian kernel function is needed to solve the calculation of high-dimensional vector inner product. Gamma value of Gaussian kernel function is mainly used to determine the kernel width of SVM. Meanwhile, soft margin support vector machine needs to introduce penalty factor to reduce noise interference. Values of gamma and penalty factor have a great impact on the classification results. Currently, grid search method and gradient descent method are commonly used for parameter optimization of SVM, but the main disadvantage of these methods is that they are easy to fall into local optimal solutions. SCA based on chaotic local search mechanism can find a good balance between local and global optimization ability, so searching these two key parameters of SVM with this method can better determine the optimal value of these two parameters.

This section will give a detailed introduction of the proposed framework for the entrepreneurial intention prediction of college students, named RF-CSCA-SVM. The main flow of the framework is shown in Figure

Flowchart of the proposed RF-CSCA-SVM framework.

The main steps conducted by the RF-CSCA-SVM are described in detail as follows:

Step 1: Normalize the data to

Step 2: Each feature of the data is evaluated using RF algorithm and the optimal subset is selected in an incremental manner based on the importance of each feature.

Step 3: Initialize a population randomly based on the upper and lower bounds of the variables.

Step 4: Evaluate the fitness of all search agents by SVM with agent as parameters and update the best solution obtained so far.

Step 5: Update the position of each search agent according to chaotic local search strategy.

Step 6: Check if any search agent goes beyond the search space and amend it.

Step 7: Evaluate the fitness of all search agents by SVM with agent as parameters and update the best solution obtained so far.

Step 8: Update iteration

Step 9: Return the best solution as the optimal SVM parameter pair (

The computational complexity of the CSCA-SVM method depends on the number of samples (^{3}). Updating the positions of search agents is^{3})). In addition, the computational complexity of^{3}))+^{3})+1))).

Random forest (RF) [

Support vector machine (SVM) is an advanced artificial intelligence technology [

SVM is a learning algorithm mainly for small samples, which is very suitable for the prediction modeling of current cases. In this experiment, we will adopt the grid search algorithm to obtain the optimal parameters of the SVM model, which is used to construct the optimal classification function, as shown in the following:

In Eq. (_{i}, _{i} is the Lagrange coefficient,_{i} is the samples to be tested (

SCA was first put forward by Mirjalili

The mathematical formula of the update agent location used in the SCA algorithm is presented as follows:

The general framework of SCA is as in Algorithm

Initialize a set of search agents (solution) (

_{1}, _{2}, _{3}, and _{4}

Chaos is a classic nonlinear natural phenomenon that has been of concern for a long time. It has many unique characteristics. Chaos is very sensitive to its initial conditions. And, also, it has the characteristics of randomness and ergodicity [

The integration of CLS mechanism into SCA intelligent optimization algorithm can not only enhance its search ability, but also make it avoid falling into local optimum. In this paper, the famous logistic formula is used to generate chaotic factors and construct chaotic systems. The logistic formula [

where

In this paper, chaotic local search is described as follows:

For the SCA algorithm, the CLS strategy accelerates its convergence speed. This acceleration is achieved by continually generating new positions with chaotic factors in the iteration process and continually adopting greedy methods to preserve the optimal solution. In this way, the solutions of the whole population will be optimized, and the optimal solution will naturally move towards the global optimum. The general framework of CSCA is as shown in Figure

Flowchart of CSCA.

The data in this paper is mainly from the 2016 and 2017 graduates of Wenzhou University. From these graduates, 300 students were selected as research subjects, of which 136 were self-employed and 164 were employed. Through the analysis of the subjects’ gender, political status, level of education, major, education year, type of normal school students, type of poor students, grade point average (GPA), and total credits, this study intends to investigate the importance and interrelationships of these nine attributes so as to establish a predictive model for decision support. Table

Description of the Nine Attributes.

| | |
---|---|---|

F1 | Gender | Male and female students are represented by 1 and 2, respectively. |

F2 | Political Status | It is divided into four categories: members of Communist Party of China, probationary members of the CPC, members of the Communist Youth League, and the masses, represented by 1, 2, 3, and 4, respectively. |

F3 | Level of Education | It is divided into master’s degree, undergraduate degree, and college degree, represented by 1, 2, and 3, respectively. |

F4 | Major | It is divided into liberal arts and science, represented by 1 and 2, respectively. |

F5 | Education Year | It is divided into 2 years, 3 years, 4 years, and 5 years, represented by 1, 2, 3, and 4, respectively. |

F6 | Type of Normal School Students | It is divided into normal students and nonnormal students, represented by 1 and 2, respectively. |

F7 | Type of Poor Students | It is divided into four categories: nondifficult students, employment difficulties, family difficulties, and employment and family difficulties, represented by 1, 2, 3 and 4, respectively. |

F8 | Grade Point Average (GPA) | GPA is a way for school to assess students’ learning quality. The score is within 0-4 intervals. |

F9 | Total Credits | It is a unit of measurement used to calculate students’ learning volume. The more credits students receive, the more they learn. |

To validate the proposed approach, we have conducted a comparative study between the proposed method and several SVM methods based on other nature inspired methods including PSO, GA, and SCA. LIBSVM [

Data was first scaled into the range _{1} and_{2} in PSO were set to 2 and the inertial weight

To evaluate the proposed method, commonly used evaluation criteria such as classification accuracy (ACC), sensitivity, specificity, and Matthews Correlation Coefficients (MCC) were analyzed. They are defined as follows:

In order to evaluate the performance of the CSCA algorithm, a series of experiments on classical benchmark functions were conducted in this section. The benchmark functions can be divided into three parts: unimodal (see Table

Unimodal benchmark functions.

Function | Dim | Range | |
---|---|---|---|

| 30 | | 0 |

| 30 | | 0 |

| 30 | | 0 |

| 30 | | 0 |

| 30 | | 0 |

| 30 | | 0 |

| 30 | | 0 |

Multimodal benchmark functions.

Function | Dim | Range | |
---|---|---|---|

| 30 | | |

| 30 | | 0 |

| 30 | | 0 |

| 30 | | 0 |

| 30 | | 0 |

| |||

| |||

| 30 | | 0 |

Fixed-dimension multimodal benchmark functions.

Function | Dim | Range | |
---|---|---|---|

| 2 | | 1 |

| 4 | | 0.00030 |

| 2 | | -1.0316 |

| 2 | | 0.398 |

| 2 | | 3 |

| 3 | | -3.86 |

| 6 | | -3.32 |

| 4 | | -10.1532 |

| 4 | | -10.4028 |

| 4 | | -10.5363 |

The performance of the CSCA is compared with MFO, BA, DA, FPA, GOA, SSA, and the original SCA. And the simulation results of CSCA on the benchmark functions are presented in Table

Results of testing benchmark functions.

Function | Metric | CSCA | MFO | BA | DA | FPA | GOA | SSA | SCA |
---|---|---|---|---|---|---|---|---|---|

_{ 1 } | mean | 1.85E − 09 | 2.12E − 01 | 1.59E + 01 | 1.57E + 03 | 5.04E + 02 | 3.76E + 01 | 1.60E − 08 | 2.89E − 02 |

std | 3.34E − 09 | 6.31E − 01 | 1.41E + 00 | 5.17E + 02 | 2.01E + 02 | 1.98E + 01 | 2.75E − 09 | 3.48E − 02 | |

rank | 1 | 4 | 5 | 8 | 7 | 6 | 2 | 3 | |

| |||||||||

_{ 2 } | mean | 3.04E − 08 | 3.61E + 01 | 2.71E + 01 | 2.32E + 01 | 1.57E + 01 | 1.62E + 01 | 2.30E + 00 | 3.94E − 05 |

std | 6.29E − 08 | 2.36E + 01 | 2.87E + 01 | 5.70E + 00 | 3.51E + 00 | 9.91E + 00 | 2.36E + 00 | 9.20E − 05 | |

rank | 1 | 8 | 7 | 6 | 4 | 5 | 3 | 2 | |

| |||||||||

_{ 3 } | mean | 7.94E − 07 | 2.61E + 04 | 9.55E + 01 | 1.78E + 04 | 4.73E + 02 | 2.67E + 03 | 8.15E + 02 | 4.44E + 03 |

std | 1.09E − 06 | 1.42E + 04 | 2.75E + 01 | 7.37E + 03 | 2.01E + 02 | 1.24E + 03 | 4.57E + 02 | 4.51E + 03 | |

rank | 1 | 8 | 2 | 7 | 3 | 5 | 4 | 6 | |

| |||||||||

_{ 4 } | mean | 1.99E − 06 | 7.87E + 01 | 2.63E + 00 | 2.71E + 01 | 1.85E + 01 | 1.40E + 01 | 1.30E + 01 | 2.63E + 01 |

std | 1.53E − 06 | 5.89E + 00 | 1.42E + 00 | 5.95E + 00 | 3.64E + 00 | 5.05E + 00 | 4.28E + 00 | 8.68E + 00 | |

rank | 1 | 8 | 2 | 7 | 5 | 4 | 3 | 6 | |

| |||||||||

_{ 5 } | mean | 7.32E − 10 | 9.69E + 03 | 4.73E + 03 | 1.96E + 05 | 3.45E + 04 | 4.59E + 03 | 1.97E + 02 | 3.10E + 02 |

std | 9.69E − 10 | 2.83E + 04 | 1.40E + 03 | 1.29E + 05 | 1.88E + 04 | 5.86E + 03 | 2.32E + 02 | 5.51E + 02 | |

rank | 1 | 6 | 5 | 8 | 7 | 4 | 2 | 3 | |

| |||||||||

_{ 6 } | mean | 9.97E − 12 | 9.90E + 02 | 1.53E + 01 | 1.54E + 03 | 5.14E + 02 | 3.63E + 01 | 1.64E − 08 | 5.99E + 00 |

std | 1.05E − 11 | 3.13E + 03 | 1.80E + 00 | 6.43E + 02 | 1.85E + 02 | 1.11E + 01 | 5.02E − 09 | 2.13E + 00 | |

rank | 1 | 7 | 4 | 8 | 6 | 5 | 2 | 3 | |

| |||||||||

_{ 7 } | mean | 3.21E − 06 | 5.34E + 00 | 3.23E + 01 | 6.37E − 01 | 3.10E − 01 | 1.21E + 00 | 1.51E − 01 | 2.45E − 02 |

std | 2.74E − 06 | 1.35E + 01 | 2.22E + 01 | 3.48E − 01 | 1.42E − 01 | 8.11E − 01 | 5.34E − 02 | 2.81E − 02 | |

rank | 1 | 7 | 8 | 5 | 4 | 6 | 3 | 2 | |

| |||||||||

_{ 8 } | mean | −1.26E + 04 | −7.83E + 03 | −7.19E + 03 | −5.39E + 03 | −7.77E + 03 | −7.63E + 03 | −7.82E + 03 | −3.92E + 03 |

std | 3.10E − 10 | 8.14E + 02 | 9.45E + 02 | 5.64E + 02 | 1.86E + 02 | 3.64E + 02 | 6.89E + 02 | 2.49E + 02 | |

rank | 1 | 2 | 6 | 7 | 4 | 5 | 3 | 8 | |

| |||||||||

_{ 9 } | mean | 1.31E − 10 | 1.78E + 02 | 2.87E + 02 | 1.84E + 02 | 9.84E + 01 | 1.06E + 02 | 7.65E + 01 | 1.84E + 01 |

mean | 2.67E − 10 | 3.90E + 01 | 3.75E + 01 | 3.00E + 01 | 2.21E + 01 | 3.70E + 01 | 1.80E + 01 | 2.03E + 01 | |

rank | 1 | 6 | 8 | 7 | 4 | 5 | 3 | 2 | |

| |||||||||

_{ 10 } | mean | 3.78E − 06 | 1.73E + 01 | 1.04E + 01 | 9.77E + 00 | 9.19E + 00 | 6.92E + 00 | 2.50E + 00 | 1.15E + 01 |

std | 2.23E − 06 | 4.72E + 00 | 7.59E + 00 | 9.61E − 01 | 1.12E + 00 | 1.26E + 00 | 2.81E − 01 | 1.01E + 01 | |

rank | 1 | 8 | 6 | 5 | 4 | 3 | 2 | 7 | |

| |||||||||

_{ 11 } | mean | 2.23E − 09 | 3.65E + 01 | 6.41E − 01 | 1.54E + 01 | 5.03E + 00 | 1.08E + 00 | 8.61E − 03 | 3.41E − 01 |

std | 5.07E − 09 | 6.37E + 01 | 5.62E − 02 | 6.87E + 00 | 1.55E + 00 | 4.87E − 02 | 9.29E − 03 | 3.20E − 01 | |

rank | 1 | 8 | 4 | 7 | 6 | 5 | 2 | 3 | |

| |||||||||

_{ 12 } | mean | 3.26E − 12 | 6.32E + 03 | 1.40E + 01 | 1.16E + 03 | 7.33E + 00 | 9.51E + 00 | 8.54E + 00 | 4.84E + 01 |

std | 7.88E − 12 | 2.00E + 04 | 5.91E + 00 | 2.16E + 03 | 4.07E + 00 | 4.35E + 00 | 5.66E + 00 | 1.34E + 02 | |

rank | 1 | 8 | 5 | 7 | 2 | 4 | 3 | 6 | |

| |||||||||

_{ 13 } | mean | 1.38E − 11 | 7.28E + 00 | 2.42E + 00 | 1.58E + 05 | 5.93E + 03 | 3.64E + 01 | 1.40E + 01 | 8.21E + 03 |

std | 1.75E − 11 | 1.15E + 01 | 4.34E − 01 | 1.41E + 05 | 1.08E + 04 | 1.56E + 01 | 1.38E + 01 | 2.59E + 04 | |

rank | 1 | 3 | 2 | 8 | 6 | 5 | 4 | 7 | |

| |||||||||

_{ 14 } | mean | 9.98E − 01 | 3.73E + 00 | 8.30E + 00 | 9.98E − 01 | 9.98E − 01 | 5.39E + 00 | 9.98E − 01 | 2.20E + 00 |

std | 7.24E − 15 | 4.51E + 00 | 5.60E + 00 | 7.54E − 10 | 7.11E − 09 | 5.01E + 00 | 1.81E − 16 | 1.01E + 00 | |

rank | 2 | 6 | 8 | 3 | 4 | 7 | 1 | 5 | |

| |||||||||

_{ 15 } | mean | 3.49E − 04 | 3.03E − 03 | 8.76E − 03 | 7.11E − 03 | 4.90E − 04 | 1.98E − 02 | 7.57E − 04 | 1.04E − 03 |

std | 1.44E − 05 | 6.11E − 03 | 9.99E − 03 | 9.15E − 03 | 2.36E − 04 | 2.70E − 02 | 2.32E − 04 | 3.90E − 04 | |

rank | 1 | 5 | 7 | 6 | 2 | 8 | 3 | 4 | |

| |||||||||

_{ 16 } | mean | −1.03E + 00 | −1.03E + 00 | −1.03E + 00 | −1.03E + 00 | −1.03E + 00 | −1.03E + 00 | −1.03E + 00 | −1.03E + 00 |

std | 1.87E − 05 | 0.00E + 00 | 7.90E − 04 | 1.11E − 05 | 3.24E − 11 | 9.56E − 15 | 1.14E − 14 | 3.82E − 05 | |

rank | 6 | 1 | 8 | 5 | 4 | 3 | 2 | 7 | |

| |||||||||

_{ 17 } | mean | 3.99E − 01 | 3.98E − 01 | 3.98E − 01 | 3.98E − 01 | 3.98E − 01 | 3.98E − 01 | 3.98E − 01 | 3.99E − 01 |

std | 7.87E − 04 | 0.00E + 00 | 5.13E − 04 | 1.36E − 06 | 6.71E − 15 | 1.41E − 14 | 7.41E − 15 | 2.00E − 03 | |

rank | 7 | 1 | 6 | 5 | 2 | 4 | 3 | 8 | |

| |||||||||

_{ 18 } | mean | 3.00E + 00 | 3.00E + 00 | 3.04E + 00 | 3.00E + 00 | 3.00E + 00 | 5.70E + 00 | 3.00E + 00 | 3.00E + 00 |

std | 2.31E − 05 | 1.75E − 15 | 3.09E − 02 | 4.13E − 05 | 8.27E − 13 | 8.54E + 00 | 1.29E − 13 | 4.81E − 05 | |

rank | 4 | 1 | 7 | 5 | 3 | 8 | 2 | 6 | |

| |||||||||

_{ 19 } | mean | −3.85E + 00 | −3.86E + 00 | −3.83E + 00 | −3.86E + 00 | −3.86E + 00 | −3.58E + 00 | −3.86E + 00 | −3.86E + 00 |

std | 2.12E − 03 | 2.49E − 03 | 1.25E − 02 | 1.37E − 04 | 5.98E − 12 | 5.48E − 01 | 5.84E − 14 | 3.13E − 03 | |

rank | 6 | 4 | 7 | 3 | 2 | 8 | 1 | 5 | |

| |||||||||

_{ 20 } | mean | −3.08E + 00 | −3.23 E + 00 | −2.86 E + 00 | −3.26 E + 00 | −3.32 E + 00 | −3.25 E + 00 | −3.22 E + 00 | −3.05 E + 00 |

std | 5.03E − 01 | 4.24E − 01 | 5.40E − 02 | 1.29E − 01 | 1.22E − 01 | 4.20E − 03 | 1.74E − 01 | 3.53E − 01 | |

rank | 6 | 4 | 8 | 2 | 1 | 3 | 5 | 7 | |

| |||||||||

_{ 21 } | mean | −1.02E + 01 | −6.90E + 00 | −3.24E + 00 | −6.05E + 00 | −1.02E + 01 | −4.40E + 00 | −6.63E + 00 | −2.98E + 00 |

std | 2.21E − 10 | 3.55E + 00 | 1.64E + 00 | 2.86E + 00 | 2.00E − 04 | 3.12E + 00 | 3.17E + 00 | 2.53E + 00 | |

rank | 1 | 3 | 7 | 5 | 2 | 6 | 4 | 8 | |

| |||||||||

_{ 22 } | mean | −1.04E + 01 | −9.64E + 00 | −6.87E + 00 | −5.13E + 00 | −1.04E + 01 | −6.52E + 00 | −8.58E + 00 | −2.51E + 00 |

std | 2.71E − 10 | 2.42E + 00 | 3.02E + 00 | 3.03E + 00 | 9.23E − 04 | 3.48E + 00 | 3.00E + 00 | 1.78E + 00 | |

rank | 1 | 3 | 5 | 7 | 2 | 6 | 4 | 8 | |

| |||||||||

_{ 23 } | mean | −1.05E + 01 | −7.57E + 00 | −5.04E + 00 | −6.47E + 00 | −1.05E + 01 | −5.46E + 00 | −7.65E + 00 | −4.56E + 00 |

std | 1.41E − 10 | 3.84E + 00 | 2.85E + 00 | 3.55E + 00 | 2.38E − 02 | 3.67E + 00 | 3.80E + 00 | 1.49E + 00 | |

rank | 1 | 4 | 7 | 5 | 2 | 6 | 3 | 8 | |

| |||||||||

| 48 | 115 | 134 | 136 | 86 | 121 | 64 | 124 | |

| 2.087 | 5 | 5.826 | 5.913 | 3.739 | 5.261 | 2.783 | 5.391 | |

| 1 | 4 | 7 | 8 | 3 | 5 | 2 | 6 |

Inspecting the detailed results of algorithms on the 23 benchmark functions in Table _{1}-_{7} functions and six multimodal_{8}_{13} functions, the proposed CSCA algorithm can outperform all other algorithms. According to CSCA and SCA metrics, the CSCA performance is improved compared to the basic SCA. In addition, the ranking results also prove that CSCA provides the best solution among all in terms of the mean index. For ten fixed-dimension multimodal functions (_{14}_{23}), CSCA has attained the exact optimal solutions for_{15}_{21}_{22}_{23}. For other six functions (_{14}_{16}_{17}_{18}_{19}_{20}), although in dealing with some cases the improved CSCA is not better than other optimizers, it is observed that the optimization effect of proposed CSCA still obtains better results than the basic SCA in more than 90% of fixed-dimension multimodal cases. The results show that the utilized chaotic local search in CSCA has enhanced the efficacy of the SCA effectively. Moreover, based on rankings, it can be observed that the developed CSCA can achieve the best place. And the overall ranks show that the SSA, FPA, MFO, SCA, BA, and DA algorithms are in the next places, respectively.

Moreover, to visually show the performance of the CSCA, convergence curves of CSCA, MFO, BA, DA, FPA, GOA, SSA, and the original SCA on some typical benchmark functions are also provided in Figure _{1},_{4}, and_{7}, the proposed CSCA algorithm has reached the best solution and it is seen that the worst results of CSCA are much better than the best values of the classical SCA and other six algorithms. It is clearly shown that the performance of the CSCA on unimodal cases has improved and the CSCA is much better than other algorithms. Regarding the convergence curves reported in Figure _{8}, the proposed technique has converged very quickly throughout early steps. For_{9}, it is seen that the fastest convergence also belongs to the CSCA algorithm, while MFO, BA, DA, FPA, GOA, SSA, and SCA cannot show a better trend. From_{12}, the proposed CSCA shows the best function value in the early stages, while other optimizers have all fallen into local optima because of their weaker search capability. For_{15}, the proposed CSCA algorithm has reached the best solution and it is clearly seen that the worst result of CSCA is much better than the best solutions of other methods. For_{22} and_{23}, CSCA has the best performance among all in terms of the std. index and is found to provide better results for these fixed dimensions cases.

Convergence curve of CSCA and other algorithms.

To sum up, we can conclude that the proposed CSCA algorithm achieves better search performance and is well capable of escaping the local optimum values than all other competitors.

In this experiment we first used the random forest (RF) to evaluate the importance of each feature of the data set. The results of the evaluation are shown in Figure

Performance of RF-CSCA-SVM on different sizes of feature subset.

Size of feature subset | ACC | Sensitivity | Specificity | MCC |
---|---|---|---|---|

1 | 0.6600(0.0625) | 0.6357(0.0473) | 0.7605 (0.1534) | 0.3265(0.1420) |

2 | 0.7300(0.0637) | 0.7399(0.0889) | 0.7323 (0.0719) | 0.4600(0.1313) |

3 | 0.7767 (0.0802) | 0.7815 (0.1025) | 0.7862 (0.0850) | 0.5560 (0.1621) |

4 | 0.7567 (0.0649) | 0.7701 (0.0750) | 0.7553 (0.1014) | 0.5136 (0.1398) |

| | | | |

6 | 0.7733 (0.0872) | 0.8168 (0.1084) | 0.7413 (0.1080) | 0.5529 (0.1831) |

7 | 0.7867 (0.0849) | 0.8482 (0.1187) | 0.7381 (0.0786) | 0.5819 (0.1791) |

8 | 0.7800 (0.0706) | 0.8307 (0.0634) | 0.7464 (0.1101) | 0.5710 (0.1360) |

9 | 0.6967 (0.1024) | 0.7282 (0.1066) | 0.6688 (0.1140) | 0.3928 (0.2073) |

The importance of features evaluated by RF.

In order to verify the effectiveness of the proposed method, we conducted a comparative study between the RF-CSCA-SVM with other four SVM models based on different nature inspired metaheuristic algorithms including RF-SCA-SVM, RF-MFO-SVM, RF-GOA-SVM, and RF-BA-SVM. The detailed comparison of the five methods is shown in Figure

Classification performance obtained by the five methods in terms of ACC, MCC, sensitivity, and specificity.

Figure

Comparison of convergence trends of several improved SVM methods based on swarm intelligence optimization.

To further evaluate the generalization capability of the proposed method, a hold-out way was conducted. The whole dataset was split into 80% and 20% for training and test, respectively. Owing to the randomness, the method has been run 10 times. The detailed results and the confusion matrix with different runs were recorded in Table

Results obtained by the proposed RF-CSCA-SVM.

| | |||||
---|---|---|---|---|---|---|

Confusion matrix | Accuracy | Sensitivity | Specificity | MCC | ||

1 | | 2 | 0.8333 | 0.9474 | 0.6364 | 0.6361 |

8 | | |||||

| ||||||

2 | | 3 | 0.8500 | 0.9189 | 0.7391 | 0.6787 |

6 | | |||||

| ||||||

3 | | 4 | 0.8333 | 0.8824 | 0.7692 | 0.6591 |

6 | | |||||

| ||||||

4 | | 3 | 0.8167 | 0.9063 | 0.7143 | 0.6367 |

8 | | |||||

| ||||||

5 | | 1 | 0.8167 | 0.9714 | 0.6000 | 0.6371 |

10 | | |||||

| ||||||

6 | | 2 | 0.8333 | 0.9412 | 0.6923 | 0.6659 |

8 | | |||||

| ||||||

7 | | 5 | 0.8333 | 0.8485 | 0.8148 | 0.6633 |

5 | | |||||

| ||||||

8 | | 0 | 0.8500 | 0.9189 | 0.7391 | 0.6787 |

14 | | |||||

| ||||||

9 | | 3 | 0.8500 | 0.9459 | 0.6957 | 0.6807 |

6 | | |||||

| ||||||

10 | | 1 | 0.8333 | 0.8438 | 0.8214 | 0.6652 |

12 | | |||||

| ||||||

Avg. | 0.8350 | 0.9125 | 0.7222 | 0.6602 | ||

| ||||||

Dev. | 0.0123 | 0.0428 | 0.0708 | 0.0178 |

The study discovered some interesting results. From the experimental results, we can find that the most important features include major (F4), gender (F1), type of normal school students (F6), grade point average (F8), and total credits (F9); the influence of these features on the choice of entrepreneurial intention is relatively prominent. The data show that different majors have obvious choice of entrepreneurial intention. On the whole, arts students have higher initiative intention than science students because they are more active in thinking, lower in employment, and more highly motivated to start their own businesses than students of science and engineering. Gender differences also have a significant impact on entrepreneurial intentions. The proportion of boys choosing to start a business is much higher than that of girls’ maybe because boys are adventurous, and girls prefer stable jobs. Academic achievement and entrepreneurial choice have a clear impact. The academic achievement mainly includes the total average score point and the total credit score, and the academic achievement is generally divided into three grades: good, middle, and next. The students whose academic achievement is in the middle stream do not have obvious advantage in the employment choice, and they will actively think about how to improve their employment possibility through various channels. The intention of starting a business is obviously stronger than that of the students in the other streams. Students with better academic performance choose stable, high-paying, or prestigious careers. Students with poor academic performance do not have a clear plan. Family situation also has a significant impact on entrepreneurial willingness and entrepreneurial behavior of college students. Students with lower socioeconomic status have higher entrepreneurial willingness. They hope to change the social class status through personal efforts. At the same time, the family situation in the aspects of financial experience, personal connection, and other aspects can provide the individual college students with the ability and convenience to start a business, thereby improving the success rate of entrepreneurship. Due to the influence of traditional ideas, normal school students’ posts are considered more stable and decent than other professions, and the concept of entrepreneurship is weak, so the normal school students’ entrepreneurial intention and choice are more indifferent than those of nonnormal school students.

It should be noted that the present study has several limitations that require further discussion. Firstly, the samples involved in this study were limited. To get more accurate results, a larger number of consecutive samples are required to be collected to take part in training the more unbiased learning model. Second, the study was accomplished in a single university. Confirmation of the model in multicenter studies would make the model more reliable for decision support. Furthermore, the involved attributes are limited; future studies should seek to investigate more attributes which may have impact on the students’ entrepreneurial intention.

In this study, we established an improved SVM framework to predict the employment intentions of college graduates. In order to improve the accuracy of prediction, this paper firstly proposes to use RF to screen the key features in the data and further proposes an improved SCA strategy to tune the optimal parameters of SVM and finally uses the established CSCA-SVM model to predict new samples. Experimental results show that the proposed method has better classification performance than the SVM method based on other swarm intelligent optimization methods on the indicators of ACC, MCC, sensitivity, and specificity. Therefore, we can draw a preliminary conclusion that the proposed prediction framework can effectively predict students' employment intention. In the future work, we plan to establish a set of decision support system based on the proposed framework to assist school departments to predict students’ employment intention. In addition, we plan to collect more data samples in the future to improve the predictive performance of the proposed method.

The data used to support the findings of this study are available from the corresponding author upon request.

The authors declare that there are no conflicts of interest regarding the publication of this article.

This research is supported by the Zhejiang Provincial Natural Science Foundation of China (LY17F020012), Science and Technology Plan Project of Wenzhou, China (ZG2017019), and the Medical and Health Technology Projects of Zhejiang Province, China (2019315504).