Injury Risk Prediction of Aerobics Athletes Based on Big Data and Computer Vision

In recent years, competitive aerobics has been rapidly popularized and developed, and the level of sports skills has also been greatly improved..e performance of some events has gradually approached and reached the advanced level..erefore, it is vital to invest in the quantitative analysis and cross-disciplinary comprehensive research of aerobics performance and related factors..is paper adopts big data analysis technology and computer vision technology based on convolutional neural network, according to the related theories of sports biomechanics and computer image recognition, to establish a loss risk prediction model for aerobics athletes. .e approach firstly has used technology of big data analysis for analyzing the characteristics of competitive aerobics sports data. Secondly, the approach combines the convolutional neural network to visually recognize the aerobics sports images and establish a two-branch prediction model. Finally, the output can be fused to accurately diagnose and evaluate the level of physical fitness development of aerobics athletes, the focus and goal of training content are clarified, and the scientific degree of aerobics training is improved. .e study can help injury risk prediction of aerobic athletes based on applications of big data and computer vision.


Introduction
In recent years, competitive aerobics has developed rapidly in our country, and the corresponding sports injury risks have gradually increased. A number of studies have shown that due to the characteristics of aerobics itself, such as strict time requirements, more difficult movement requirements, fast-paced music accompaniment, and coherent coordinated movements, athletes will suffer sports injuries [1][2][3] if they are not paying attention. e shoulders, elbows, wrists, waist, thighs, knees, calves, and ankles are the parts that are more prone to injury during aerobics training. Among them, the most prone to injury is the ankle joint [4,5]. In addition, the types of injury most likely to occur for competitive aerobics athletes is closed injury, most of which are joint strain, sprain and muscle strain, and chronic injuries [6][7][8][9], which are the main ones. However, the current scholars' research on aerobics injuries usually uses questionnaire surveys or expert interviews to determine the injured parts and possible mechanisms of aerobics exercises, and there is a lack of objective empirical research.
In addition, teenagers are in the golden stage of physical development, and various physical qualities will be significantly improved during this period. However, in interviews with young aerobics athletes and coaches, it is found that young athletes have more injuries than adult athletes. . is is due to the weakness of the muscles and joints of young athletes, which limits the ability to develop skills and longterm training of irregular technical movements and body postures. In the adolescent period of aerobics athletes, scientific and reasonable training can not only promote the physical development of adolescents but also improve their athletic ability more effectively [10]. erefore, timely discovery of the causes of injury to young athletes and timely prevention are critical to improving the skill level of young athletes and extending their sports life.
Due to the characteristics of competitive aerobics, athletes are required to complete a series of high-intensity movements in a short time, which requires a higher level of physical fitness and physical flexibility. Studies have confirmed that long-term high-intensity repetitive exercise training and asymmetric sports skills and postures will increase the risk of athletes' injuries. At the same time, the adolescent stage is a special period of physical development and a period of high incidence of sports injuries [11]. erefore, this research is based on the development characteristics of adolescents' physical ability and the special characteristics of aerobics. It takes college students in a certain city as the research object, and the tests are conducted in groups of different genders and aerobics sports grades, and uses the receiver operating characteristic (ROC) curve to formulate evaluation standards, analyze the physical weakness of young aerobics athletes, and evaluate the risk of noncontact injury to provide a certain theoretical reference for subsequent aerobics training for young people. is paper presents a method of motion recognition for aerobics athletes based on machine vision, which recognizes joint strain, sprain, and muscle strain caused by their movements, and uses big data to train the vision-based deep learning [12][13][14][15][16] algorithm of this article. Following are the main innovative points of this paper: (i) is article proposes a method of motion recognition for aerobics athletes based on machine vision, which recognizes joint strain, sprain, and muscle strain caused by their movements. (ii) is paper constructs a dual-branch injury risk prediction model for aerobics athletes. One branch uses big data to analyze the characteristics of aerobics athletes' movement injuries, and the other branch builds a deep convolutional neural network model to identify joint strain, sprain, and muscle strain and perform counting and prediction. (iii) We have conducted sufficient comparative experiments and ablation studies to prove the effectiveness of the algorithm based on big data and computer vision proposed in this paper. It can be used to discover the causes of injury to young athletes in time and prevent them in time, which is useful for improving the skills of young athletes. and prolonging sports life. e organization of the paper is as follows: Section 2 shows the related research to the proposed area of research. Section 3 represents the methodology section of the proposed study with details of the given approach. Section 4 shows the experiments and results of the current study. e paper is concluded in Section 5.

Related Research
Various approaches and techniques have been devised in the literature for injury risk prediction of aerobics athletes.
Fanian et al. [17] studied 6755 cases of sports injuries and investigated and found that the sport with the highest incidence is football; the common injury sites are knee joints (15%), calves (11.9%), and wrist joints (11%). e most common types of injuries are fractures (12.8%), cartilage injuries (6.87%), and contusions (2.5%); 12% of the patients underwent surgery; the average hospital stay was 3-4 days. Michael [18] surveyed athletes in various sports such as gymnastics, basketball, football, and running. After summarizing and analyzing the types and characteristics of athletes' sports injuries in different sports, they proposed the diagnosis and treatment of various types of sports injuries. Method aims at the occurrence of injury and points out two major steps to reduce the occurrence of sports injuries: one is to wear protective gear in vulnerable parts to reduce the probability of injury and the degree of injury; the other is to improve the athletes as much as possible. Physical fitness, including strength, agility, and flexibility, reduces the chance of athletes' injuries. Wiese-Bjomstal et al. [19] pointed out that sports injury is a relatively unacceptable thing for athletes. After a sports injury occurs, athletes will have complex psychology during training and rehabilitation, which is mainly reflected in cognition, emotion, and behavior. On the one hand, when athletes experience the process of recovering from injury and returning to training, they often have certain cognitive and emotional reactions, which are mainly affected by the individual and the environment. In addition, the time of injury will also affect the psychology of injured athletes to varying degrees. When sports injuries occur shortly before major competitions, the athletes' sense of disappointment and despair will be greatly enhanced [20].
Malliou et al. [21] used questionnaire surveys and conducted statistics and analysis on the injuries of athletes engaged in aerobics training and found that, in the injured population, lower limb injuries accounted for 97.3% and ankle and knee injuries were the most common in aerobics training. At the same time, it is pointed out that training time, years, and training level will have an impact on injury. Bintoudi et al. [22] investigated two aerobics pedal athletes and found that they had knee joint pain and fat pad edema. e article discussed the possible pathogenic factors and mechanisms of aerobics. Kiesel et al. [23] found in a study of professional American rugby players during nonseason that the scores of the FMS test are directly proportional to the athlete's sports injury risk. rough the calculation of the receiver operating curve, a benchmark score of 14 points is delineated for rugby players. Athletes with a season average FMS score of less than 14 points have a much higher risk of injury than those with more than 14 points. In his subsequent similar research, he found that after targeted intervention training for athletes, the number of FMS scores greater than 14 increased significantly, and athletes with movement asymmetry problems could also be improved. Dennis Rex [24] conducted FMS tests on 67 college football players and combined their lower limb explosive power and season injury data. ey believed that the FMS test can be used as an assessment tool to predict serious sports injuries and reminded that the FMS score was less than 11 points. 2 Scientific Programming e sports injury risk of athletes is 9 times that of other athletes, and timely corrective training is required. Dorrel et al. [25] followed up on 257 college athletes' injuries after conducting FMS tests and found that those athletes who scored less than 15 on the FMS test were more likely to get injured.

Methodology
is section mainly introduces the big data platform technology and related theories related to the research of this article. Figure 1 shows the examples of aerobics sports injuries.
e proposed approach firstly introduces the core architecture, working principle, and related ecosystem of the Hadoop platform and then introduces the basic ideas of the data mining theory of big data analysis and the common algorithms of data mining are outlined. Finally, the principle of convolutional neural network [26][27][28][29][30][31] and the two-branch algorithm proposed in this paper are explained. Figure 2 shows the overall framework of the proposed approach.

Data Mining and Analysis.
e following subsections show the data mining and analysis section of the paper.

Big Data Platform.
Hadoop is currently one of the big data platforms widely used by relevant research institutions in the industry. It is an open-source top-level project maintained by the Apache Foundation. Hadoop inherits the idea of distribution and fully applies it to data storage and processing technology.
e Hadoop ecosystem is also composed of many open source software. e open source of the entire ecosystem allows the software in it to be supervised and maintained by everyone, and the stability and security of the system will have a relatively high guarantee.
Moreover, the open source of the entire system greatly facilitates the use of users. Big data computing is no longer the unique ability of some professional organizations and has gradually become a field that every developer can set foot in. Figure 3 is its ecosystem architecture.
HDFS is the distributed file system of Hadoop. As the data foundation of the entire ecosystem, HDFS is located at the bottom of the ecosystem. It has very low requirements for hardware resources, can run on computers with low configuration, and can ensure the safety and stability of data in the event of hardware failure through redundant storage.
e file system divides each file into multiple parts, each part is called a data block, and each data block is copied into three copies and stored in different locations, which not only provides high throughput of data access but it also has extremely high fault tolerance. Figure 4 shows the HDFS architecture diagram. As the role of the manager in HDFS, NameNode undertakes the core tasks. It is responsible for managing the mapping information of data blocks. As the hot backup storage of the NameNode, the significance of the existence of the secondary and the NameNode is to cope with the single point of failure that may occur in the NameNode. HDFSClient is a client that accesses HDFS, and all client requests will first interact with the NameNode. Each DataNode maintains interaction with the NameNode and serves as a block where the slave node stores data.
MapReduce is the computing engine of Hadoop. Job-Tracker is the core of MapReduce, responsible for all task allocations and job scheduling. JobTracker decomposes tasks and distributes them to each TaskTracker node for execution. TaskTracker runs maps and reduce tasks and reports the status of tasks to JobTracker regularly. ere are many important components in the Hadoop ecosystem. HBase is a high-performance, scalable column storage database for structured data. Hive originated from Facebook, which is a Hadoop-based data warehouse, mainly used to solve the problem of massive structured data statistics. e biggest feature of Hive is that it can convert SQL into MapReduce jobs so that it can be executed on Hadoop. Zookeeper is used to solve the problem of application coordination in a distributed environment. Sqoop is responsible for data transmission tasks between traditional databases and Hadoop. Pig can convert scripting languages into MapReduce jobs and execute them on Hadoop. Mahout is a library of machine learning algorithms for Hadoop. Flume is an open source log collection system.

Data Mining.
As a data processing technology, data mining aims to discover the laws and knowledge behind the phenomena hidden in reality through a series of operations and calculations on data. It is also one of the current research hotspots in the scientific field. Data mining technology draws on the advantages of a variety of related technologies, which can extract hidden valuable information from real data and provide references for actual activities. From the methodological point of view, data mining can be divided into two categories: description and prediction. e similarity between the two is that the law is calculated through the existing data. e difference is that the purpose of the description is to provide interpretation support for the law of the data, and the prediction is to provide forecasts for actual activities.
e functions of data mining are divided into the following categories: (1) Concept or class description: is kind of data mining mainly describes the class or concept of data through the method of data differentiation and characterization. Data classification classifies the data to be mined by constructing a comparative dataset. Data characterization is by first querying existing related datasets and then summarizing their characteristics.
(2) Predictive modeling: Data mining prediction methods are divided into two categories, among which classification and regression methods are applied to discrete and continuous variables. Predictive modeling derives a model through training so that the error between the predicted value and the actual value of the specified variable reaches the global minimum. e   Scientific Programming application of this type of algorithm is commonly used in disease risk prediction. (3) Association analysis: Used to describe the associated features in the data, the discovered patterns are usually expressed in the form of implicit rules or feature subsets. e association rules derived from the association analysis can reveal the dependence of each element in the dataset and indicate the conditions under which the attributes appear in the dataset with a specific frequency. For example, the correlation analysis of multiple symptoms of patients can find the correlation rules that occur between the symptoms. (4) Cluster analysis: e main function of this type of data mining algorithm is to divide the dataset into valuable or meaningful groups. Cluster analysis and classification analysis are both similar and different. e difference is that cluster analysis belongs to unsupervised classification. e class label needs to be obtained from the data, and the number of classes is not known before clustering. (5) Abnormal detection: Based on the analysis of the dataset, abnormal characteristics or abnormal data are obtained. It is often used in the diagnosis of abnormal diseases and the detection of abnormal network traffic.

Data Mining Algorithm.
As data mining continues to incorporate new domain knowledge, data mining algorithms are constantly evolving and mining methods are becoming more abundant. People can choose specific mining algorithms based on mining needs. e following are several commonly used algorithms for data mining: (1) Neural network: Neural network is an intelligent algorithm that simulates human brain nerve transmission. It is generally composed of three parts: input, output, and implicit. It is divided into three models. ey are feedforward network model, feedback network model, and self-organization network model. e neural network has strong ability to process nonlinear data, good fault tolerance performance, and high classification accuracy, but low performance in the interpretation of the results. e most commonly used in the medical field is a multilayer feedforward neural network, the BP neural network. It is worth noting that the neural network used in this article is a convolutional neural network.
(2) Decision tree: Decision tree is a tree structure model with classification rules, and its logical branch relationship is top-down. e decision tree selects the root node according to the variable attributes according to parameters such as information gain and Gini coefficient and then divides down according to the variable attributes of the root node to form branches; then, each branch node retests the variable attributes and continues to branch down and so on continue until the node's category is homogenized or reaches the set threshold. e algorithm can be converted into classification rules to classify diseases according to their symptoms, thereby predicting diseases. Common algorithms include ID3 algorithm, C4.5 algorithm, and CART algorithm. (3) Clustering algorithm: e purpose of clustering algorithm is to obtain several classes through computational analysis, which is an unsupervised learning method. Among them, the data between different classes are usually uncorrelated or there are certain differences, and the data between the same classes have a certain correlation or similarity. e k-means algorithm is the most common clustering algorithm. (4) Association rules: Association rule mining is the process of finding strong association rules through the specified minimum support and minimum confidence. Usually, it consists of two parts, one is to find all frequent item sets, and the other is to find the association rules in frequent item sets. Common algorithms include Apriori algorithm and FP-growth algorithm. (5) Association classification: Association classification algorithm is one of the important classification methods. e characteristic of this classification method is to first extract the association classification rules and then build a model to predict unknown instances and combine association rule mining and classification. Association classification algorithms usually consist of three parts: rule generation, rule sorting and pruning, and prediction of new instances. Common algorithms include CBA algorithm, CMAR algorithm, and ACSER algorithm. Figure 4 is a structural diagram of a fully connected neural network, and Figure 5 is a structural diagram of a convolutional neural network.

Scientific Programming
Although the two are quite different on the surface, they are actually very similar in structure. Convolutional neural networks are also connected through layers of nodes, and each node also represents a neuron. e difference is that a fully connected neural network usually has a connection to every neuron in every two adjacent layers, while only some nodes in the adjacent layers of a convolutional neural network are connected, which can effectively alleviate the problem of excessive parameters of the fully connected neural network. Too many parameters of the neural network will cause the calculation speed to be slow, the calculation time is too long, and at the same time, it is more prone to overfitting. Convolutional neural networks can effectively reduce the number of parameters and speed up model training.
In the first few layers of the convolutional neural network, the data nodes are transformed into a three-dimensional matrix, and only some nodes are connected to adjacent layers. A convolutional neural network usually consists of the following structure.

Input Layer.
e input layer is the input of the neural network. In the image-oriented convolutional neural network, it represents the pixel matrix converted after a picture is read by a computer. e pixel matrix is a three-dimensional matrix. e length and width represent the size of the image, and the depth represents the number of color channels of the image. When the input picture is a black and white picture, the depth is 1; when the input picture is a color picture, the depth is 3. Starting from the input layer, the three-dimensional matrix is transformed into another threedimensional matrix through a different network structure until the final fully connected layer.

Convolutional Layer.
e convolutional layer is the most important part of the convolutional neural network and the key to extracting features. Figure 6 is a schematic diagram of the convolutional layer transformation. e small matrix of the input layer and the convolution kernel are convolved to obtain the small matrix of the output layer. e convolution kernel is also a three-dimensional matrix. Its length and width are manually set. e convolution kernel is also a three-dimensional matrix, its length and width are manually set, and the size is 3 * 3 or 5 * 5. Since the depth of the two matrices of the convolution operation must be the same, the depth of the filter cannot be changed and must be the same as the depth of the input layer matrix. e convolution kernel also needs to manually set the number, and its number determines the depth of the output layer matrix. e process from the input layer to the output layer is called forward propagation. Assuming that a x,y,z is used to represent the value of a certain point in the input matrix, w x,y,z represents the value of the position of the convolution kernel, and b represents the node's corresponding bias term parameter, then where g is the output of the corresponding point,(m, n, q) are the length, width, and number of the convolution kernel, respectively, and f is the activation function. e activation function acts as a nonlinear transformation of the output. e commonly used activation function consists of a linear rectification unit (ReLu) and a hyperbolic function (sigmoid). Figure 7 that a pooling layer is usually added after the convolutional layer. e pooling layer can reduce the size of the input data, reduce the number of parameters, and speed up the network calculation at the same time. Preventing the occurrence of overfitting problems is similar to the forward propagation process of the convolutional layer. e pooling layer is also completed by sliding a filter structure on the matrix. e difference is that the pooling layer uses a simple maximum value or the average method. Figure 7 shows the calculation process of the pooling layer using maximum pooling. Here, the size of the filter is 2 × 2, that is, the target area of a pooling operation is 2 × 2 nodes, and the step size is 2, that is, the calculation is done in every 2 steps.

Fully Connected Layer.
e fully connected layer of CNN, like the fully connected neural network, maps the features learned by the previous convolutional layer and the pooling layer to the sample label space, which acts like a classifier. e fully connected layer is not necessary; it can be replaced by a convolutional layer using a convolution kernel of size 1 × 1. Scientific Programming

Softmax Layer.
In the use of multiclassification, the softmax layer is often used as the last layer of the deep neural network, which makes the output of the network straightforward to the probabilities of various classifications. e three aerobics sports injury categories in this article are multiclassification tasks.where n is the number of training samples, y is the label value of the training data, L is the number of network layers, and a L (x) is the output of the network. Introduce the mean square error function as the cost function in back propagation, Calculate the error of each unit of the output layer, the definition of the error of the jth unit of the L-th layer: According to the chain rule, we have Its matrix form is After calculating the error of the output layer, the error of the previous layer should be calculated. According to the chain rule, we can get erefore, In this way, errors can run through the entire network by back propagation. Let joint strain, sprain, and muscle strain be x, y, and z. e output equation through the softmax function is

Experiments and Results
e following subsections show the experiments and results of the paper.

Experimental Environment and Hyperparameter Settings.
is experiment uses the PyCharm compiler and the Ten-sorFlow deep learning framework in the Windows environment and uses the small batch learning method, the Adam optimizer, the learning rate is 0.0001, 100 images are selected for training at a time, and a total of 1000 iterations are performed. e experimental model parameters are shown in Table 1.

Subjects of the Experiment.
Taking the freshman and sophomore aerobics students in a certain city as the research object, there are a total of 60 girls. At the beginning of the experiment, there were 2 people with acute sports injuries, and they were finally confirmed as 58 people. ere was no significant difference in the height, weight, age, and training years of the 58 aerobics professional girls.

Experimental Methods.
e visual recognition experiment based on convolutional neural network in this paper is divided into training phase and testing phase. e training phase has the following contents.

Aerobics Squat (Deep Squat).
Stand upright with your feet shoulder-width apart or slightly wider than shoulderwidth apart, with your toes pointing straight ahead. Place the test rod on the top of the head and adjust your hands to make the elbow joint 90°; extend both arms at the same time to lift the test rod to the head. en, slowly squat until your thighs are parallel to the ground, keep your heels from leaving the ground, raise your head and chest, keep your back straight, do not arch your waist, and hold the test rod as far above your head as possible. During the squat process, the knee joints on both sides are in the same plane as the feet. Do not buckle your knees inward, and always keep your toes pointing forward. is squat test was repeated three times.
is experiment takes photos of the entire training process for the neural network to learn the characteristics of joint strain, sprain, and muscle strain.

Active Straight Leg Raise.
e subject lies on his back, with his arms on his side, palms facing up, and lying flat; the test board is placed under the knees with the toes pointing directly up.
It is placed in the middle of the line between the anterior superior iliac spine and the midpoint of the patella, perpendicular to the ground. e subject lifted one leg to the maximum extent and kept the knee joint straight during the lifting process. e knee joint on the other side should be kept in contact with the test board as much as possible, with the toes pointing upward, and the opposite side was also the same. is experiment takes photos of the entire training process for the neural network to learn the characteristics of joint strain, sprain, and muscle strain. e training process is 1 week, and the test actions are 50 sets. e experimental results are shown in Table 2.
Scientific Programming e CNN training results are shown in Figure 8. When the number of training steps is greater than 900, the loss function value drops below 0.1.

Visualization of Results.
It can be seen from Figure 9 that the acute sports injury rate of aerobics in the low-risk group is lower than that of the high-risk group.
erefore, regardless of whether there is a corrective training intervention, the injury rate of the two low-risk groups will not be very high. e members of the high-risk group have a higher risk of injury during the training process. erefore, the injuries of the two high-risk groups can better verify the effectiveness of the corrective training.
is experimental     test proves the effectiveness of the algorithm in this paper and can improve the scientific degree of aerobics training.

Conclusion
Competitive aerobics has been rapidly promoted and established, and the level of sports expertise has also been significantly enhanced. e performance of some events has progressively approached and reached the advanced level. erefore, it is vital to invest in the quantitative analysis and cross-disciplinary wide-ranging research of aerobics performance and associated factors. is paper constructs a novel dual-branch aerobics athlete injury risk prediction algorithm based on big data and computer vision technology, and through experimental research, it has been proved that big data analysis can extract effective features from competitive aerobics data. Secondly, combined with a convolutional neural network to visually recognize aerobics images, it can accurately diagnose and evaluate the physical fitness development level of aerobics athletes, clarify the focus and objectives of the training content, and improve the scientific degree of aerobics training.

Data Availability
e data used to support the findings of this study are included within the article.   Scientific Programming 9