Analysis and Evaluation of Sports Effect Based on Random Forest Algorithm under Big Data

Under the background of big data, all walks of life have carried out in-depth informatization construction. As an important part of national education, the informatization construction of universities cannot be ignored. In recent years, the state has promulgated various policies and regulations, which provide a guarantee for the normal development of school physical education and can improve the current situation of college students’ declining physical health to a certain extent. is study attempts to set some specic indicators to promote the better implementation of college sports so that it can supervise and inspect them from all aspects during the actual development of college sports and provide a standard for measuring the implementation eect of college sports. Based on the RF (Random Forest) algorithm, this paper puts forward an evaluation algorithm of students’ sports achievements, which can be used to predict students’ sports achievements and at the same time, nd out the factors that aect students’ learning and rank them in importance. e results show that the condence level of sports eect evaluation by this method is high, and the average condence level is above 0.96. Conclusion. is method has improved the eect of sports eect evaluation, thus eectively guiding sports skills training and improving sports skills.


Introduction
Vigorously developing college students' physical education is a major event to promote the construction of a healthy China and realize the Chinese dream. It is necessary to make college students love sports and participate in sports as a living habit so that college students can have a healthy physique, good interpersonal relationships, and strong character. It is necessary to mobilize all social forces to protect the health of college students. e state has made constant regulations on the quality and quantity of sunshine sports, which is enough to show that the state attaches importance to college students' participation in sunshine sports and cares about students' physical health. e research shows that there are many declining trends of college students' physical health in China, and one of the most important reasons is that they do not cultivate good sports habits and cannot do extracurricular exercises by scienti c methods or means.
Sports achievement is mainly a measure of whether there is a gap between students' physical tness and expected goals and the means to measure this gap. It can only show the results of students' physical exercise but cannot re ect the causes of such results. Yang and Li introduced the classic Apriori algorithm and the famous DT (Decision Tree) ID3 algorithm of association rules [1] and used Apriori association rules algorithm to mine the in uence degree of a course's excellence on other courses. Attar et al. put forward a design idea of multistrategies, combining DM technology with statistical analysis and using DT-based classi cation mining method to analyze the data in the student's sports achievement database, and generate the student's sports achievement DT, which can intuitively show the position of a certain achievement in di erent grade calculation methods and provide evaluation information for teaching departments [2]. Wu et al. used the classical C4.5 algorithm to construct the prediction model of students' professional achievement, including data processing, pruning optimization of DT, and evaluation of the performance of the model by cross validation, to find out the potential students' behavior rules [3]. Yuan et al., based on the theory of performance evaluation, designed a performance evaluation index system to measure students' growth according to the process performance, taking innovative and designed experiments in the teaching stage of students' professional courses as the process [4].
Big data has the characteristics of large amount of data, various types, low-value density, high speed, and high timeliness. e era of big data poses new challenges to people's existing data control ability and also provides unprecedented space and potential for people to gain deeper and comprehensive insight. With the continuous accumulation and expansion of data, data mining is particularly important. erefore, how to scientifically and reasonably mine students' sports achievement data and find the potential knowledge and information in the achievement data so as to guide students and teaching staff in the next stage of study and work has a positive guiding significance for the improvement of teaching quality. erefore, this paper proposes to use RF (Random Forest) algorithm to accurately show many aspects of college students' sports performance analysis from a quantitative point of view so as to help teachers, students, and physical education departments formulate corresponding measures, which is conducive to the cultivation of students' physical exercise habits, the promotion of healthy physique, and the improvement of physical education teaching quality.

Analysis and Research on Sports
Effect. When human beings are engaged in sports activities, they cannot live without the environment, so they must exchange material, energy, and information with the natural and social environment. e factors that make up the environment are bound to be connected with sports activities as a whole or with each other.
Yi and Fan think that the school sports environment refers to the sports environment within a school, and the school sports environment, school moral education, learning, and other environments together constitute the school education environment [5]. Ouyang and Liu divided the campus sports environment into five subenvironments from the content, namely, sports consciousness environment, training environment, teaching environment, hygiene environment, and competition environment, and discussed the influence of subenvironments on students' physical and mental development [6]. Bideau et al., under the guidance of the concept of lifelong physical education, analyzed the result evaluation and the process evaluation so that the university physical education teaching evaluation can be evaluated from two aspects: process and result, making it more scientific and reasonable [7]. Li et al. calculated the weights of each index and each subsystem by AHP (analytic hierarchy process) and made single ranking and total ranking. e weighted linear method is used to synthesize the indexes of the system to evaluate the development level of each subsystem and the system [8].
Yin and Cui studied the application of PCA (principal component analysis) and Bayesian KNN (K-nearest neighbor) algorithm in DM (data mining). PCA was used to reduce the dimension of characteristic data, and Bayesian Knearest neighbor algorithm was used to sell and classify, that is, to predict career direction [9]. Sarlis and Tjortjis put forward an improved algorithm after researching and analyzing ID3 algorithm, which can mine and analyze the data stored in the educational administration management system, so as to find out the relationship between curriculum settings and provide some data basis for the university's achievement statistics to assist with decision-making [10]. Harvey et al. used model long-distance running to establish fuzzy evaluation model for data analysis. ey solved the problem of conversion between national physical health standards and students' sports scores [11].

Research on RF Algorithm.
RF is a multi-DT model that combines multiple DTs to make predictions. It has the advantages of ranking the importance of variables, high prediction accuracy, strong antinoise ability, few adjustment parameters, strong adaptability to data sets, and effectively avoiding "overfitting" phenomenon.
Poona et al. used RF to select the characteristic genes of five groups of gene expression profile data, such as lung cancer, stomach cancer, and colon cancer, and combined the selected characteristic genes with support vector machine to classify the original data set [12]. Provost et al. improved the node partition method of each tree in RF algorithm and proposed an improved RF algorithm. is algorithm takes the customer's life value as the dividing node and solves the problem of unbalanced data distribution [13]. Lee et al. used RF algorithm to classify different land types, and the results showed that RF algorithm could accurately classify land cover, which was consistent with expectations [14]. Naghibi et al. adjusted the weight of each sample according to whether the samples were correctly classified and proposed a weighted RF combining algorithm based on adaptive lifting [15]. Ming et al. applied greedy random adaptive search program with annealing randomness to RF algorithm, and its feature selection can improve the accuracy of classification [16].
Ghorbanian et al. designed and implemented an improved RF algorithm by using Spark distributed storage computing technology and proposed an improved RF algorithm based on Spark [17]. Bei et al. put forward the RF model of fusion factor analysis. e main innovation lies in constructing feature groups by factor analysis and then randomly extracting features according to the number ratio of features to form candidate subsets of each split node. Improve the accuracy and convergence speed of the model [18]. Raphael [19]. Lin et al. used the improved RF algorithm to build an evaluation model based on the historical data of wheat breeding and put forward the application of RF algorithm in the auxiliary evaluation of wheat breeding [20].
e experimental results show that the proposed algorithm has achieved good results in wheat germplasm evaluation and can assist breeders in variety selection.

Analysis Method of Sports Effect.
Nowadays, the computer network has become very popular in universities. Using DM technology to establish a university sports achievement management system should be able to provide administrators, teachers, and students with sufficient information and quick query means, complete the work of teachers' scoring, and make statistics, analysis, and processing of data. It is conducive to scientifically set and arrange one's own fitness goals and exercise plans. Physical exercise can increase the opportunities of contact and communication between people. By taking part in sports activities, you can forget your troubles and pains, eliminate your loneliness, and gradually form the awareness and habit of interacting with others. It also helps students to change their bad behaviors and living habits that affect their health.
Cooperation ability is not only a necessary quality for participants in sports activities, but also a kind of ability that needs to be developed through sports activities. College students' participation in sports activities, especially in collective sports activities, requires the full cooperation between themselves and others, which not only enables the collective goal to be achieved, but also gives full play to the role of individuals.
DM is the process of extracting hidden information and knowledge from a large number of incomplete, noisy, fuzzy, and random data, which people do not know in advance but are potentially useful. It is of great significance to detect these anomalies and deviations, which can better optimize the model. ere are many different types of deviations, such as special cases that do not conform to the basic rules, abnormal cases in clustering, values that change with time, and so on. From the perspective of DM, the application of DM technology in scientific management of physical examination results can be divided into three stages: data preparation, pattern search, and visual representation. Figure 1 shows the general framework model of scientific management of physical examination results. e scientific management framework of physical examination results described in Figure 1 can be realized by four steps: the establishment of data objects, data preprocessing, data mining, and the representation and evaluation of pattern knowledge. In the scientific management of physical examination results, there are many data sets involved, such as teacher information, student information, examination results, and so on. erefore, it is the first condition to establish a suitable DM object and make clear the goal of DM.
Data preprocessing is an important link in the process of DM, especially when the data containing noise, incompleteness, or even inconsistency are DM; it is even more necessary to improve the quality of DM objects and finally achieve the purpose of improving the quality of pattern knowledge acquired by DM.
Information gain metrics tend to test with many outputs, that is, tend to select attributes with a large number of values [15]. Information gain rate is the expansion of information gain, which can overcome this problem. It normalizes the information gain by using the "split information" value. Split information is defined as follows: is value represents the information generated by dividing the training data set D into v partitions corresponding to the attribute A test. e submodel of RF algorithm is CART (Classification and Regression Tree) model. CART algorithm uses binary tree construction in binary recursion mode, and every time the whole sample set is divided into two subsets, thus generating two branches of subtrees. Gini index is mainly used to indicate the impurity of data set. Gini index of sample set D is defined as where p i is expressed as the probability that the data in sample set D belongs to category C j . If the binary partition of sample set D based on attribute A will be divided into two subsets D 1 , D 2 , then the Gini index based on this partition can be calculated as the weighted sum of the impurity of each partition: Applying DM technology to improve the level of systematic analysis, through in-depth and scientific analysis of examination results, students can know the relative position  of individuals in the group from horizontal and vertical aspects, make clear the gains and losses of individual learning and the differences with others, and promote individual learning. At the same time, teachers and school education decision-makers can understand the problems existing in teaching and reflect on the quality of education and teaching.
Introducing the observation set of sports effect evaluation, comprehensively analyzing and scheduling the sports effect evaluation model, and setting the value of the reliability weight ω of sports effect evaluation are represented as ω max , ω min represents the regulation coefficient of sports effect evaluation, T max is the time delay, and t is the correlation factor of sports effect evaluation. Combining with the scheduling method of association rules, the cloth fusion model of sports effect evaluation is obtained [6], and the calculation formula is New i represents the distributed scheduling set of sports effect evaluation, thus constructing the fuzzy association rule set and prior distribution set of sports effect evaluation. By using the big data fusion analysis method [7], the characteristic analysis model of sports effect evaluation is established to improve the effectiveness of sports effect evaluation.

Construction of Evaluation Model of Students' Sports
Performance.
e construction of university sports environment indicators must follow the scientific principle, and the selection and construction of evaluation indicators should be based on scientific nature so as to avoid mistakes caused by empiricism and subjective judgment as far as possible so as to ensure the reliability of research results. erefore, we should select the indicators from multiple dimensions, comprehensively and systematically, and try our best to make the selected indicators more comprehensive and systematic. Unless there are related scientific methods for conversion and measurement, it will affect the credibility of the evaluation. Ensure the operability, availability, and practicability of the evaluation process of college students' campus sports environment, which are the basis of accurate and feasible evaluation results and popularization and application.
Weight is a kind of quantitative value that compares and balances the evaluation object with a quantitative relationship to show its relative importance. According to different evaluation purposes, the weight set according to the importance of indicators is called the weight coefficient [17]. e general subjective weighting methods are illegal, analytic hierarchy process and weighting method. Subjective weighting requires evaluators to score the importance of each index and make an evaluation. Its characteristic is that it can humanize the opinions of experts and evaluate the importance of each index [19].
Although the accuracy and efficiency of the traditional RF algorithm are better than other classification algorithms in dealing with high-dimensional data, there are some defects in some data: if the data dealing with classification problems are unbalanced data, the classification results predicted by RF algorithm are not ideal; if the voting weights are the same, there will always be errors in the final results, especially on the unbalanced data sets, and the classification results tend to be biased towards most categories. e default parameters are not necessarily the best choice, and it is not efficient to use the traditional grid search method to find the best parameters.
In view of the above problems existing in the evaluation of students' sports performance, this paper puts forward an evaluation algorithm of students' sports performance based on RF algorithm, which can be used to evaluate students' sports performance, predict the students' related sports test scores of senior grades through the students' sports test scores of junior grades, sort the variables in importance, and find out the key factors that affect students' academic performance.
On the basis of the previous research, this paper proposes a new comprehensive optimization algorithm, which mainly integrates simulated annealing algorithm into the execution process of RF algorithm and uses binary coding, OOB (OutofBag) error minimization, and simulated annealing operation to obtain the best combination of feature selection O, DT scale K, feature subset scale N, and DT weight W t .
We set the objective function of the improved RF algorithm as follows: where K, O is a real number, K ∈ [0, 500], and w j ∈ [0, 15]. e value of Attribute i is 0 or 1, where 0 indicates that the feature has not been selected, and 1 indicates that the feature has been selected.
If the sample set is F, a few classes in the data set are F min , most classes are F max , and the imbalance coefficient E is shown in the following formula: e classification ability of each DT in RF is different. In the traditional RF algorithm, the votes of each DT are the same, which is unreasonable. In order to improve the classification ability of RF algorithm on unbalanced data, avoid the situation where the classification results of RF algorithm are seriously biased towards the majority class and make it perform well on the minority class of unbalanced data sets. e weight of DT voting is shown in the following formula:

Mobile Information Systems
E i is the unbalance coefficient of the data sampled from the training set of the ith DT, min i is the number of minority samples with correct prediction of this DT, and L is the length of data samples to construct this DT.
According to the Gini index, the importance of these factors is ranked, in order to help students make targeted tutoring according to the experimental results, guide the teaching work of teaching staff, provide decision-making support for education management departments, and improve teaching quality. Sample collection of students' sports achievements: X � X(X 1 , . . . , X n ) is the explanatory variable, that is, the course name related to the course to be predicted, and Y is the target variable, that is, the student's sports achievement to be predicted. e optimized prediction value of sports effect evaluation is as follows: x t n+1 � X m+1 (m). (10) e structure of student sports performance evaluation model based on RF algorithm is shown in Figure 2.
e basic idea of RF algorithm is as follows: (1) From the original data set of students' sports scores after the above data preprocessing, the Bootstrap sampling method is adopted to randomly extract N scores training sets.

Result Analysis
In order to verify the effectiveness of the design model in sports effect evaluation, MATLAB is used for simulation analysis, and SPSS statistical analysis software is used for big data analysis of sports effect evaluation. e distribution sample length of statistical data of sports effect evaluation is 1026, the frequency of feature sampling is 1.44 kHz, and the training set size is 300. e results are shown in Figures 3 and  4.
It can be seen that the classification ability of this algorithm is improved compared with that of RF algorithm. e same AUC (Area Under Curve) value under the default parameters of this algorithm and the AUC value after the optimization of RF algorithm parameters indicate the effectiveness of the improved algorithm. From the comparison of parameters, it can be seen that parameter optimization on data sets is helpful to improve the classification ability of the algorithm. e smaller the RF scale, the lower the depth of the tree, and the higher the efficiency, the better the algorithm, under the condition that the Kappa coefficient and AUC value of the algorithm are guaranteed. e quality of data determines the result of classification, so the data is preprocessed and then classified by RF. It can effectively improve the sensitivity of RF to unbalanced data. It is an important stage of college students' personal development, and all physical skills are developing vigorously. Excessive learning pressure may lead to students' physical and mental exhaustion. erefore, we encourage students to participate in leisure sports. Moderate leisure sports can increase their vital capacity, promote students' quick thinking and clear mind, and help improve their learning efficiency.
rough in-depth analysis of the existing achievements of students of the same major in universities, this paper explores the potential rules of achievement data, samples a large number of achievement data to form different training sets, and then forms a number of DTs to predict the students' achievements, respectively. Finally, the   Generally speaking, in independent tasks, competition has advantages, because in such tasks, the requirements for mutual cooperation among members are not very high, and the goal of individual activities is not to defeat others, but to accomplish the task. With the increasingly fierce competition in modern society, it is helpful for college students to adapt to the society well after they step out of school and enter the society by cultivating their competitive consciousness and ability. In order to ensure the fairness of the evaluation of college sports, it is convenient and accurate to use unified evaluation methods and means in the evaluation process, to make the subjective evaluation content tangible and digital, and to use computer data processing software for evaluation. In addition, it is necessary to combine statistical evaluation with expert evaluation to establish a fair and just evaluation system.
For the data set of students' sports achievements, the feature variables that need feature planning include students' activity in class, students' absenteeism, the number of times that students visit teaching resources after class, the number of times that students participate in course discussions, and students' satisfaction with the course. Other feature variables are calculated by using original values, such as birthplace with coded values and gender with Boolean values of 0 or 1.
In the process of realization, firstly, the discrete data is normalized to [0, 1] based on the log function standardization method, and the corresponding weight coefficient is assigned to each factor so that the students' classroom activity data can be weighted. Figure 5 shows the comparison of the classification results of students' sports performance data sets on various basic algorithms. Figure 5 shows that we use a variety of basic DM algorithms for comparison and classification, trying to analyze the differences of various DM algorithms. From the experimental results, it can be seen that the RF algorithm has a slight advantage over other data algorithms in the student characteristic data set, but the overall difference is not big.    Mobile Information Systems What attributes can determine whether students can pass the sprint by using DT technology? Take the whole sampling method to select some students' sports achievements as training sets, and copy these achievements records to the training example worksheet. Using the function, find out the number of people who passed and failed in a single subject in the training set, as shown in data Table 1.
e key to construct a good DT lies in how to choose a good logical judgment or attribute. It has been found that, in general or with a high probability, the smaller the tree, the stronger the prediction ability of the tree. To construct DT as small as possible, the key lies in choosing appropriate logical judgment or attribute. Here, the information gain is used for attribute selection.
According to the information gain of each attribute, the attribute of long-distance running should be selected as the root node of DT. Other results are used as the test set and used to test the accuracy of the generated DT. e following rules can be obtained from this DT, as shown in Figure 6. e learning level of students' long-distance running will directly affect their learning effect of sprint events. e learning of hurdle events also has a certain influence on the learning of sprint events. erefore, teachers should consider students' long-distance running foundation when teaching sprint. Students with good long-distance running and average sprint should pay more attention to the study of hurdle events.
Compared with the culture course, physical education class is not valued by many schools and parents, and because of the pressure of entering a higher school, the duration of students' physical exercise is shortened. Parents cannot fully recognize the benefits of physical activities for students and attach too much importance to the achievements of culture course, which leads to their neglect of the cultivation of students' physical quality. Due to the negative effects on the body, leisure and psychology after taking part in physical exercise, such as physical fatigue after long exercise time or high intensity, learning collapse will be affected. In addition, under the pressure of traditional ideas and crazy social environment, physical exercise and academic performance are negatively correlated in form.
By comparing the factors that affect students' physical exercise at each level, we can find that the experience of individual participation in physical education class or physical exercise constitutes the common factors that affect students' participation in physical exercise. In behavioral learning theory, operant conditioning holds that learning is based on the response results. Poor students in physical education class: some students are often criticized in class because they do not obey discipline or behave badly. In this case, even if they have their own favorite sports, their performance in physical education class is restricted, and their sports performance is not very good, which affects their understanding of sports to a certain extent.
Some students, though not doing well in the cultural course, have been active in physical education class, received positive responses, and had a positive experience in physical education class. According to the above simulation parameters, the sports effect is evaluated, and the statistical distribution sample set is shown in Table 2.
According to the results in Table 2, the big data analysis of sports effect evaluation is carried out, and the big DM results of sports effect evaluation are shown in Figure 7.
According to the analysis of Figure 7, the confidence level of sports effect evaluation by this method is high, and the average confidence level is above 0.96. e results of these professional practice courses are mainly graded by teachers according to the experimental results made by students. After observing the data, it is found that the scores of these courses are concentrated compared with those of professional theoretical courses, and the difference of scores is not obvious, which has little effect on analyzing students' later professional courses. erefore, the experimental results are consistent with the actual situation. Among many factors that affect students' poor physical performance, "exercise situation" has the greatest influence on students' performance. For a student who never exercises, if his attendance rate in class is low and his foundation is average, then his exam results must be poor.
College students play an important role among the members of the society and undertake a strong social mission. Whether one can fulfill one's own responsibilities and gain good trust from others in sports not only plays an  important role in promoting one's role in sports, but also makes certain adaptation for college students to enter the society in advance. Strengthen the education of college students' safety knowledge, improve their self-management ability, formulate a reasonable and planned extracurricular physical exercise schedule, and select sports with low sports risks characterized by fun, safety, health, and positivity according to the value of the risk factors studied so as to reduce the occurrence of risks.
Physical education teachers should constantly optimize the educational methods. Teachers should learn and progress from each other, probe into each other, sum up experiences, form an educational method suitable for college students, and guide college students to love sports, cultivate their interest in sports, and participate in sports consciously. Set up the leading organization of extracurricular sports together, formulate corresponding rules and regulations, and accomplish all tasks in organizing extracurricular sports activities in schools. In organizing extracurricular sports activities, more college students have the opportunity to participate in sports, feel the charm of sports, and experience the fun of sports so as to participate in sports exercises and improve their physical quality.

Conclusion
Physical education is an important part of school teaching. How to make computer and network technology serve the daily physical education teaching in schools is an inevitable requirement under today's big data. Based on RF algorithm, this paper puts forward a scientific evaluation algorithm for students' sports achievements. rough this research, combining pattern recognition and big data analysis methods, the optimization of sports effect evaluation is realized. e research shows that this method has a high confidence level in sports effect evaluation, and the evaluation results are accurate and reliable. It can correct and intervene students' learning in advance so that students can achieve better results in professional courses and play a very good scientific guiding role in assisting students' learning behaviors and optimizing educational decisions.
is study only considers the relationship between participation in physical exercise and academic performance from the perspective of a school. As the author's ability is limited, this study does not take into account factors such as school differences, which is also something that needs to be explored in future studies.

Data Availability
e labeled data set used to support the findings of this study is available from the corresponding author upon request.