Athletes’ State Monitoring under Data Mining and Random Forest

The study aims to train athletes to be in top form and at their best in the competition. Based on the relevant theoretical research, archers are taken as the research subjects, the characteristics of archery are analyzed, and the electroencephalogram (EEG) features of the athletes in different stages of precompetition training are monitored. And the athletes’ competitive state monitoring model based on random forest (RF) is implemented and tested. The experimental results show that the athletes’ dominant frequency of brain band α, EEG entropy, central fatigue index, excitation inhibition index, and cerebral state index in precompetition training is significantly different from those in training (P < 0:05).The monitoring model implemented classifies athletes’ competitive states. Compared with the support vector machine (SVM) classification model, its classification accuracy is higher than 90%. The overall classification accuracy is 89.74%, more significant than SVM. The research provides a reference for monitoring athletes’ competitive states and helps them regulate their states in real time.


Introduction
Archery is popular among people in China and symbolizes China's traditional culture. It is listed as one of the events in second Olympic Games in 1900 [1]. As a traditional sport, it attracts more and more attention due to the excellent performance of Chinese archers in various world events in recent years. Monitoring electroencephalogram (EEG) is a method of recording brain activities using electrophysiological indexes. In the early 1950s, scientists in the former Soviet Union studied the application of EEG to sports training [2]. Schchumiller studied athletes' EEG and obtained their states at different stages in acquiring the relevant skills. Majiev et al. discussed the EEG features of athletes when they are feeling fatigued. And the related research becomes more and more extensive with the development of science and technology.
EEG is used for concussion injury and recovery. Wilde et al. (2020) combined EEG and neurocognitive data. They proposed the indexes to enhance brain function, which can significantly change the diffusion of athletes suffering from concussion and have clinical application value for a concussion [3]. Zhao et al. (2021) used the deep learning method to analyze the EEG signals of athletes and designed a channel attention module connected to the input layer of convolutional neural networks (CNN) to reduce the risk of suffering from concussion again after recovery training [4].Besides, EEG can evaluate athletes' competitive state and guide athletes' training. Duru and Assem (2018) used the psychological subtraction method to discuss the effectiveness of nerves and utilized EEG to measure the cognitive dynamics of karate athletes during the break time and in doing sports [5]. Sultanov and İsmailova (2019) explored the relationship between EEG rhythm oscillation and competitive anxiety when opening and closing eyes. They also took young football players as the experimental subject to test their sports competition anxiety, recorded their prefrontal EEG with a single channel mobile EEG system, and analyzed the EEG rhythm as a predictor of stress with a regression model. This study provides a method for predicting athletes' emotional states [6]. Bailey et al. (2019) used portable EEG equipment to track the psychological state of climbers during the climbing activity. The results show that climbers are more relaxed (at the critical moment) and (α activity) have introverted attention (θ activities) when challenging more difficult routes [7]. Tharawadeepimuk and Wongsawat (2021) used the brain topographic map (absolute force) and brain connectivity (coherence and amplitude asymmetry) to analyze the psychological factors when doing sports. They evaluated athletes' performance in competition through noninvasive quantitative EEG [8]. Bieru et al. (2021) used EEG to record the brain activity of 12 judo athletes and 11 volleyball athletes during hand flexor contraction and relaxation, which helps coaches evaluate the training effect of athletes [9]. Zhu (2021) assessed the impact of EEG information and central nervous transmission on athletes' regulation and training, providing a basis for improving the level of archery training [10].
In short, the above studies are mainly on athletes' concussion injury and evaluate their psychological state via EEG. Still, there are few on the application of EEG to monitoring athletes' competitive states. Therefore, archers are taken as the research subject to explore the characteristics of archery and test the EEG of the national team under different training states. An athletes' competitive state monitoring model is implemented based on the random forest (RF) and used to test the EEG of athletes in the precompetition training stage. The EEG characteristics of archers under different training states are analyzed. The evaluation criteria of test indexes are constructed, and the model's performance is tested. The innovation is to classify the competitive state of athletes based on RF, so that coaches can intuitively and conveniently know the states of athletes. The study provides a fundamental tool for coaches to monitor and control athletes' competitive states, which has practical significance.

EEG.
Classifying the competitive states of athletes can better detect and adjust the states of athletes. The commonly used classifiers include the nearest neighbor algorithm, naive Bayesian classifier, radial basis function neural networks, and RF. The nearest neighbor algorithm is simple and easy to implement, but it performs poorly when the samples are imbalanced. Naive Bayesian works well on small-scale data and can complete multiple classification tasks, but it needs a priori probability before use. The structure of radial basis function neural networks is simple, but it cannot display the reasoning process. RF can highly parallelize the training. It can randomly select the node division features of a decision tree (DT) and mark the importance of each feature in the output results. The data trained after random sampling has small variance and strong generalization ability. Therefore, RF is selected as the classifier to test athletes' competitive states.
EEG is obtained by recording brain activities using electrophysiological indexes. EEG presents neural electrical activities by following a specific law. Specifically, the frequency of the activities is 1-30 Hz and can be divided into six bands, namely, δ, θ, α 1 , α 2 , β 1 , and β 2 [11]. The frequency of δ is 1~3 Hz, and its amplitude is 20~200 μV. This waveform can be measured in infancy or immature intellectual development and when people are exhausted, sleepy, or under anesthesia [12]. The frequency of θ is 4~7 Hz, and its amplitude is 5~20 μV. This waveform is more common among adults with frustrated will, depression, or psychosis. The wave frequency of α is 8~13 Hz (the average is 10 Hz), and its amplitude is 20~100 μV, the most common waveform in human brain waves. When people are quiet and close their eyes, this waveform appears frequently, but when people open their eyes or receive other stimuli, this waveform will disappear immediately [13]. α 1 represents the regulation factor, and the performance of the human brain in the state is concentrated and inspired [14]. α 2 is a state in which the brain is highly awake, focused, and detached [15]. The frequency of β is 14~30 Hz, and its amplitude is 100~150 μV. This waveform will appear when people are nervous and impassioned. The original slow wave will immediately become a single fast waveform [16]. β can help athletes reduce tension and pressure and improve their ability to respond and deal with emergencies. β 1 wave shows that the human brain is in a thinking state [17]. β 2 shows that the brain is alert and excited [18]. If the athlete's β waveform fluctuates significantly, the excitability of the athlete's central nervous system gets stronger, his speed and intensity of nerve are strengthened, and his stress ability is improved, forming an excellent state to win in the competition.

RF
Model. RF is one of the tools for data mining. As its name implies, RF uses a random method to build a DT in a forest. In RF, there are no relations between any DTs. Therefore, DT should be discussed first in the study of RF [19].
DT is a commonly used classification method. Its generation falls into two steps. One is the splitting of nodes. When the attribute represented by a node cannot make judgments, this node should be divided into two subnodes (if it is not a binary tree, it will be divided into n subnodes). This node is called an internal node, and the node that can judge the attribute is called a leaf node, forming a tree structure. The other is the determining of the threshold. An appropriate threshold should be selected to minimize the classification error rate. Figure 1 shows the schematic diagram of DT.
The DTs are commonly used by iterative dichotomizer 3 (ID3), C4.5, and classification and regression tree (CART). Among the above, the classification effect of CART is better than others. It selects the optimal feature through the GINI coefficient minimization criterion, determines the optimal binary segmentation point of the feature, and generates a binary tree. If there are K categories in CART, the probability that the sample points belong to class k is p k , and the GINI index of the probability distribution is calculated by

Journal of Sensors
For the second classification problem, if the probability of the sample point as the first class is p, the GINI index of the probability distribution is calculated by If the training sample set D = ½ðx 1 , y 1 Þ, ðx 2 , y 2 Þ, ⋯, ðx n , y n Þ, x is the eigenvector, y is the sample type, and its GINI index is calculated by In equation (3), jC k j is the number of K-type sample points in D.

Journal of Sensors
D is divided into D 1 and D 2 according to whether feature A takes its possible value a, and then In equations (4) and (5), if D = D 1 + D 2 , then the GINI index of D under the condition of A = a is calculated by RF is composed of multiple DTs. Each DT decides the final classification result of the test sample by voting. The model is shown in Figure 2.

Theoretical Research on Competitive States.
Competitive states are the spiritual activities in competitive sports events and training. Stone et al. believes that competitive states are the best preparation for sports performance obtained by athletes through corresponding training [20]. They are the best short-term states of psychology and physics. Some Chinese scholars define competitive states as instant states when athletes compete. These states change dynamically, and athletes' best performance in psychology and physics is called "the best competitive state." Bompa, a Romanian Canadian Sports Training scholar, uttered that the competitive state could be measured and evaluated, and he classified the competitive states. If he achieved more than 98% of the best results last year, the athletes have the best competitive state. If he completed 96.5%~98%, the athlete's state is normal; if he achieved 95%~96.5% best results, the athlete's state is poor; if the best results are less than 95%, the athlete's state is worst. Here, the view of Chinese scholars is that the competitive state is an instant state when athletes participate in the competition.

Characteristics of Archery.
Archery is an ancient sport. Athletes complete bow pulling and archery action by standing still and coordinating force. The characteristics of archery show the role of muscle in antifatigue ability during long-time training [21]. The basic requirements of archery are fast, accurate, and stable. "Fast" means that athletes'   Journal of Sensors technical actions should be clear and fast. In the face of changes in the external environment, athletes should respond quickly and adjust themselves quickly. "Stability" means that athletes need to overcome external and internal interference and give stable play to their technical level and control their emotions. "Accurate" means that athletes can play their technical actions and hit the target accurately in the competition. Compared with other sports, archery athletes are greatly affected by their psychological load. No matter in training and conditions, athletes have tremendous psychological pressure. The training activities require concentration, sensitive and accurate proprioception, precise nerve control, fast response, good information processing ability, decisive decision-making, and regular exertion of ability under huge stress, all of which depend on the quality of athletes' brain function. The adaptation level of archers to the psychological load will be directly reflected in the central nervous system changes. Because it is simple, noninvasive, and repeatable, EEG is one of the important means of clinical medicine and brain cognitive science. The research athletes' EEG features can help athletes regulate their brain mechanism in daily training and competition, enabling them to participate in the competition in the best form and achieve the best results.
Some scholars found that athletes' EEG features could reflect their tension and competitive states before the competition. Some scholars analyzed the EEG of swimmers the day before the match and found that there are special spatial configurations of serotonin, acetylcholine, and dopamine in their brain center. This demonstrates those athletes' psychological state changes before the competition, and their psychological loads before the competition can be monitored by their EEG.   Figure 3.
The classification accuracy of RF is essential to measure the classification results of the RF model, which indicates the proportion of correct classification datasets to the length of all datasets. Its calculation equation is In equation (7), T correct is the number of correct data for RF classification in the test set, andT all is the data length of all test sets.

Experimental Methods. Selection of subjects is as follows:
the sample set of the experiment is the archers of the national training team, and the total number of subjects is 57.
Test steps are as follows: use EEG to test the athletes, respectively, and fill in the athlete's psychological fatigue questionnaire and automatic force rating table.
Test indexes are as follows: dominant frequency of α brainwave, EEG entropy, central fatigue index, excitation inhibition index, and brain functional state index.
The dominant frequency of brain band α [22] is as follows: it is the probability of each waveform. When the probability of the dominant frequency is the maximum, other frequencies will decrease. And the information in the brain will be more concentrated. On the contrary, when the probability of the dominant frequency in the brain decreases, the brain's concentration will decrease, reflecting the concentration of athletes' attention to a specific event.
EEG entropy [23] is as follows: it shows the uncertainty of brain band α and the order of dominant frequency. It also reflects the response of athletes to external interference. The entropy value of EEG is between 0 and 1. The smaller the entropy value is, the less the brain band α is. The better the order of the dominant frequency is, the smaller the influence of external interference on athletes is. It shows that the information in the athletes' brains is messy, and the impact of external interference is great.
Major neurotransmitter levels include γ-gamma-aminobutyric acid (GABA), glutamic acid (Glu), acetylcholine receptor (AchR), acetylcholine (Ach), 5-hydroxytryptamine (5-HT), noradrenaline (NE), and dopamine (DA). Among them, GABA is an inhibitory neurotransmitter, and it affects  Journal of Sensors the excitability of neurons greatly. Glu is an excitatory neurotransmitter. AchR includes muscarinic receptors and nicotinic receptors. The former produces a parasympathetic excitatory effect, and the latter can excite postganglionic neurons in an autonomic ganglion. Ach makes the human brain stay conscious. 5-HT is a messenger and can produce pleasant emotions. It has an important impact on regulating brain activities such as emotion and energy. NE has both inhibitory and excitatory effects. DA is related to human lust and feelings and conveys excitement and happiness.

Journal of Sensors
Central fatigue index [24] is as follows: a value used to show the degree of fatigue.
Excitation inhibition index [25] is as follows: a value used to reflect whether the brain is excited.
Brain function state index [26] is as follows: a value used to reflect brain synergy.
Data processing is as follows: it is used to score the athletes according to the questionnaire, and then the training status of the athletes is evaluated (the scores of the two questionnaires account for 50%, respectively). Athletes' psychological fatigue questionnaire is developed by Raedke and Smith in 2001. In the questionnaire, athletes' psychological states fall into three dimensions: emotional/physical exhaustion, reduced sense of achievements, and negative evaluation of sports, including 15 questions. The survey results of athletes' psychological fatigue questionnaire are divided into five grades: "Never," "rarely," "sometimes," "often," and "always," with scores of 5 points, 10 points, 15 points, 20 points, and 25 points, respectively. The automatic force rating table is also divided into five levels, namely "very relaxed", "relaxed", "slightly laborious", "laborious" and "very laborious" and the corresponding scores are 5 points, 10 points, 15 points, 20 points, and 25 points, respectively.
When the athlete's final score is 5 points, 10 points, 15 points, and 20 points, it indicates that the athlete is not tired. When his score is 25 points, the athlete is in a state of fatigue.
In the questionnaire surveys, 57 athletes' psychological fatigue questionnaires are distributed, and 57 are recovered, with a recovery rate of 100%. The number of effective questionnaires is 57, and the questionnaire efficiency is 100%. 57 automatic force rating questionnaires are distributed, and 57 are recovered, with a recovery rate of 100%. The number of effective questionnaires is 57, and the questionnaire efficiency is 100%.

Brain Function Feature Data of Different Competitive
States. The questionnaire survey results on 57 athletes of the national training team are shown in Figure 4. Figure 4 shows that the size of athletes with 5 points is small, and 3 athletes' psychological states are "relaxed." The numbers of athletes with 15 points and 20 points are more significant, and they are 16 and 19, respectively, indicating that the overall competitive state of the athletes is poor.
The dominant frequency of the athletes' brain band α in different competitive states is tested, and the average variance under different training degrees is calculated. The results are shown in Figure 5. Figure 5 shows that when the scores of training states are 5 points, 10 points, 15 points, and 20 points, the values of athlete's brain band α are approximate, and the difference is not statistically significant. When the athlete's training score is 25 points, the values of their dominant frequency of brain band α are lower than that of other states. Compared with the nonfatigue state of 5 points, 10 points, 15 points, and 20 points, P > 0:05 and the difference are statistically significant.
The values of athletes' dominant frequency of brain band α in different training states are analyzed, and the corresponding scoring criteria are shown in Table 1.
The EEG entropy of athletes under different training states is tested, and the test results are shown in Figure 6. Figure 6 shows that when the score of the training state is 25, their average score of the EEG entropy test is higher than that of others, which has a significant difference compared with other training states (P < 0:05), which is statistically significant. The EEG entropy of athletes at other scores is not statistically significant.
The scoring standard of the athletes' EEG entropy test in different training states is established, as shown in Table 2.
The primary neurotransmitter levels of athletes under different training states are tested, and the test results are shown in Figure 7. Figure 7 shows that with the increase of training state scores, 5-HT and Ach in the main neurotransmitter levels of athletes are growing, while DA shows a change of lowhigh-low. As athletes continue to exercise, their dopamine level gradually rises. When the athletes feel fatigued, their states become worse, the pivot fatigue gradually appears,    Under different training states, athletes' 5-HT, Ach, and central fatigue indexes change regularly. Their levels of 5-HT and Ach will increase when the central nervous system is anoxic and glucose-deficient. Based on the above, the evaluation standard of the central fatigue indexes to evaluate athletes' fatigue states is only discussed. The scoring standards of central fatigue index and excitation inhibition index of athletes in different training states are shown in Tables 3  and 4.
The brain state indexes of athletes in different training states are tested, and the test results are shown in Figure 9. Figure 9 shows that when the training state score of athletes is 25 points, the brain state index of athletes is the highest, which is significantly different from that of athletes at 5 points, 10 points, 15 points, and 20 points (P < 0:05). There is no significant difference in athletes' brain function state index at other points.
The evaluation criteria of brain state indexes of athletes in different training states are established, as shown in Table 5.

Model Performance Test.
According to the evaluation criteria, the competitive states of athletes are divided into five levels: excellent, good, general, poor, and very poor. The monitoring model and the classification model based on SVM monitor the competitive states of athletes, respectively, and the test results of classification accuracy are shown in Figure 10. Figure 10 shows that when the RF model classifies the competitive state of athletes, the classification accuracy is greater than that of SVM models. When the competitive  9 Journal of Sensors state of athletes is in the better and good states, the classification accuracy gap between the RF model and SVM model is small. Still, the competitive states of athletes are general and poor. When their states are very poor, the classification accuracy of the RF model is more than 90%, and it is much higher than that of SVM models. The overall classification accuracy of the RF model is 89.74%, which is much higher than 80.35% of SVM models. This proves that the RF model is better in detecting athletes' competitive states.

Conclusion
The competitive ability of athletes is affected by their competitive states. In the training stage, coaches usually regulate the competitive state of athletes employing training rhythm control and psychological counseling, so that athletes can participate in the competition in the best form and play their best in the match. The characteristics of archery are analyzed, and athletes' competitive states are monitored through their EEG features. The athletes' training state is evaluated using the questionnaires survey, the EEG feature data under different training states are collected, and the EEG characteristic evaluation criteria are established. The criteria provide a basis for constructing the classification standard of athletes' competitive states. An athlete's competitive state monitoring model is implemented based on RF and tested. The experimental results show that when the athlete's training state is 25 points, his dominant frequency, EEG entropy, central fatigue index, excitation inhibition index, and brain state index are significantly different from those of the other athletes (P < 0:05). The athletes' competitive states are classified using the monitoring model, and the classification accuracy of each index is greater than that of the SVM model. The overall classification accuracy is 89.74%, higher than that of the SVM model. The research helps coaches regulate athletes' competitive state in training, but there are still some shortcomings. For example, the size of the samples is too small, and the model implemented still has much room for improvement, which will be the focus of the follow-up research.

Data Availability
The labeled dataset used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
The author declares no competing interests.