Extraction and Recognition Method of Basketball Players’ Dynamic Human Actions Based on Deep Learning

.e extraction and recognition of human actions has always been a research hotspot in the field of state recognition. It has a wide range of application prospects in many fields. In sports, it can reduce the occurrence of accidental injuries and improve the training level of basketball players. How to extract effective features from the dynamic body movements of basketball players is of great significance. In order to improve the fairness of the basketball game, realize the accurate recognition of the athletes’ movements, and simultaneously improve the level of the athletes and regulate the movements of the athletes during training, this article uses deep learning to extract and recognize the movements of the basketball players. .is paper implements human action recognition algorithm based on deep learning. .is method automatically extracts image features through convolution kernels, which greatly improves the efficiency compared with traditional manual feature extraction methods. .is method uses the deep convolutional neural network VGG model on the TensorFlow platform to extract and recognize human actions. On the Matlab platform, the KTH andWeizmann datasets are preprocessed to obtain the input image set. .en, the preprocessed dataset is used to train the model to obtain the optimal network model and corresponding data by testing the two datasets. Finally, the two datasets are analyzed in detail, and the specific cause of each action confusion is given. Simultaneously, the recognition accuracy and average recognition accuracy rates of each action category are calculated. .e experimental results show that the human action recognition algorithm based on deep learning obtains a higher recognition accuracy rate.


Introduction
In the field of sports and athletics, the standard of the basketball player's action is the key to determine the athlete's performance. e traditional scoring method relies on the human eye to score, which may cause a large error or injustice. Using human action extraction and recognition technology to capture and analyze the actions of athletes can provide more accurate action information data as a scoring reference, which is very effective in improving the fairness and accuracy of scoring, can also train athletes, and provide accurate action data for the coach's reference to improve the athlete's action level.
Many scholars at home and abroad have conducted related researches on three aspects: feature collection, deep convolutional neural networks, and human action recognition. Holden uses the acceleration sensor embedded in the smartphone to collect data and finds that the placement of the sensor has a great influence on the accuracy of the experiment and uses algorithms to extract and learn data features. However, in the classification algorithm, the classifier cannot distinguish certain similar motion states, which is also a problem that many algorithms currently have [1]. Scott uses deep convolutional neural networks for feature learning, mainly by exploring the intrinsic features of each activity and one-dimensional time series signals, while providing a method for automatically extracting robust data features from the original data. Experiments show that although the complexity level of each layer of features decreases, each layer of the convolutional neural network can still distinguish complex features [2]. Lecun uses a new general human body state recognition algorithm, which uses Kalman filtering to filter the data, real-time judgment of human body movement, stillness, and state transition. Experimental results show that the algorithm has achieved good performance on mobile devices with limited computing and storage capabilities [3].
From the perspective of deep learning, this article extracts and recognizes the dynamic human movements of basketball players, and uses Bi-LSTM neural network and TensorFlow to identify and simulate human basketball training, calculates the accuracy of simulation experiment results, and finally collects basketball players' movement data.
e innovation of this article is to apply the convolutional neural network in deep learning to the action analysis of basketball players, which can improve the efficiency of athletes' training and improve their physical fitness and competitive ability.

Deep Convolutional Neural Network VGG Structure.
Deep convolutional neural network is a special type of neural network. Its super learning ability is mainly realized by using multiple nonlinear feature extraction stages, which can automatically learn hierarchical representations from data.
is kind of convolutional neural network is mainly composed of input layer, convolution layer, activation function, pooling layer, fully connected layer, and output layer.
VGG-16 is a 16-layer convolutional neural network model, which contains 13 convolutional layers (conv), 525 pooling layers (pool) and three fully connected layers (FC), by dividing the convolutional layer group. In operation, the 16 layers are divided into five convolutional layer groups and three fully connected layer network structures [4,5]. e network structure is as follows.

Convolutional Layer Group A.
In the structure of convolutional layer group A, it contains two convolutional layers and a maximum pooling layer. Each convolutional layer uses 64 convolution kernels with a size of 3 × 3. e step size of the convolution kernel is 1, and the padding is 1. e step size (s) and fill size (p) satisfy the following relationship [6,7]:

Convolutional Layer Group B.
Convolutional layer group B contains two convolutional layers and a maximum pooling layer. Each convolutional layer uses 128 convolution kernels of size 3 × 3. e step size of the convolution kernel is 1, and the padding is 1. erefore, the size of the feature map is (112 + 2 × 1-3) ÷ 1 + 1 � 112; the pooling layer uses maximum pooling [8,9]. e size is 2 × 2, and the step size is 2; then, the size of the feature map after maximum sampling is (112-2) ÷ 2 + 1 � 56.

Fully Connected Layer
Group. e fully connected layer group has three fully connected layers, of which the first two fully connected layers contain 4096 neurons, and the last fully connected layer's neuron output must be consistent with the number of categories to be divided [10,11]. e last fully connected layer uses Softmax classifier to classify the data. Softmax is a normalized function that can convert a set of ratings of (−∞,+∞) into a set of probabilities, and let their sum be 1, and this function is order-preserving. e original large rating is converted e probability is large, and the small score corresponds to the small probability. e formula of the Softmax classifier is as follows: where θ T i x is multiple inputs. It can be seen from the previously mentioned formula that the Softmax classifier can obtain multiple values, and the sum of these numbers is 1, and the output results are in the interval [1,0]. Such a form can be seen. It becomes a question of probability [12,13].

Data Processing.
e datasets used in this article are the KTH and Weizmann datasets. [14]. KTH includes 2391 video samples of six types of actions performed by 25 people in four different scenarios. It makes it possible to use the same input data to systematically evaluate the performance of different algorithms. KTH and Weizmann datasets are the most cited databases in the field of behavior recognition, and they have greatly promoted the research of behavior recognition. Both KTH and Weizmann are public datasets, where KTH includes six types of human motions, and Weizmann includes ten types of human motions [15,16]. First, the Matlab platform is used to intercept the video data of two datasets every five frames as the marker set of the dataset. en, valid frames that can correctly express the characteristics of the action in the allocation set are selected, and they are processed into sizes through geometric changes. It is a 224 × 224 image dataset [17,18].

Bi-GRU Neural Network.
e process of obtaining a linear sum between the existing state and newly calculated state is similar to the LSTM unit [19,20]. However, GRU has no mechanism to control the degree of exposure of its state, and the entire state is displayed every time [21,22]: where z t represents the value of the update gate; h is the value of the hidden layer; σ represents the activation function; tanh represents the activation function; w represents the weight; r t represents a set of reset gate values [23,24]. e previously mentioned is an introduction to the neural network used in this article. Each neural network has its own advantages and characteristics. Because smartphones are easy to carry, it is very convenient for coaches to test the athletes' movements, and TensorFlow can be used on the phone. TensorFlow officially supports iOS and Android, and the display of this model is very clear. Transplanting the trained model to a smartphone to test the recognition of the neural network model is the best way to test the model.

State Extraction and Recognition of Human Basketball.
is article mainly uses the acceleration sensors and gyroscopes in Android smartphones to collect the status data of common human basketball sports and builds a neural network architecture through the TensorFlow deep learning platform, uses a variant of the recurrent neural network to build a network model, and selects and recognizes through experimental results e neural network model with better performance is transplanted into the smartphone. e gyroscope is used to measure angular velocity, and the acceleration sensor is used to measure linear acceleration. e former is the principle of inertia, and the latter is the principle of force balance. e measured value of the acceleration sensor is correct for long periods, but for short periods, due to the existence of signal noise, there is an error. e gyroscope is more accurate in a short period of time, and there is an error with drift in a longer period of time.

Human Basketball Status Recognition
Steps. First, collect acceleration and gyroscope sensor data under different basketball motion states of the human body. Here, PhyPhox sensor data acquisition software is used for data acquisition. Second, preprocess the collected data, including data denoising, filtering, and marking the human basketball game state of motion; third, divide the length and data structure of each piece of data to achieve data segmentation; fourth, use the convolutional neural network in deep learning to extract the features of each collected human action; fifth, continuously adjust the hyperparameters and neural network structure to obtain a neural network model; the sixth step is to combine the neural network model with TensorFlow technology, and transplant Android technology to mobile phones for viewing and analysis. Table 1 to calculate the accuracy, recall, and F values. Precision, also known as accuracy and correct rate, represents the proportion of related documents retrieved among all documents retrieved. Recall represents the ratio of retrieved related documents among all related documents. F-measure is the weighted harmonic average of accuracy (P) and recall (R), where 0.867 is the final accuracy of classification. From the experimental results, it can be seen that the Bi-LSTM neural network recognizes WK and WU better, but the recognition probability of SI and ST is lower, mainly because the two actions of SI and STare relatively similar and belong to the motion of the static category. It is difficult to distinguish between similar features with high feature similarity, which reduces the overall recognition accuracy. Recognizing similar motion states has always been the difficulty of state recognition. You can consider increasing the amount of data or adding different types of sensors to increase unique characteristics of similar motion states. It can be seen from Figure 1 that the position corresponding to WK-WK on the line chart is the number of a class that is predicted to be correctly classified. e larger the number at the corresponding position, the better the classification result. Except for the numbers in the corresponding positions, the others are misclassified. Table 2 is a table of training accuracy rates generated by TensorFlow. In Figure 2, the gray bars represent the recognition accuracy of the training dataset during training. e red curve represents the recognition accuracy of the test dataset. It can be found that, from the beginning of the training to the end of the data, the overall slope of the two curves is getting smaller. Simultaneously, the accuracy of the training data is high, indicating that the model recognizes the data that was learnt better. It also has a good ability to recognize unknown data, indicating that the model has strong generalization ability and can recognize unknown data well. In the first 200 iterations, the two curves rise faster, indicating that the loss function has a good ability to deal with the model, can quickly converge, and extract the key features of the state of human basketball. After 200 times, relatively slow learning started, and more detailed extraction and learning of feature details began.

Analysis of Training Accuracy Generated by TensorFlow for Human Basketball.
In general, the training accuracy table generated by TensorFlow can quickly extract the key features of the athlete's actions, and the more the number of times, the more detailed the feature extraction. Figure 3, the difference between the RUN and WAL signals is very obvious, and the periodicity of each signal is relatively regular. e difference is mainly the length of each cycle and the size of the highest and lowest values. rough the comparison between the signals, the differences between the signals and their respective movement characteristics are found. In the feature extraction, through the analysis of the acceleration and angular velocity signals, the action state can be better extracted.

Using APP to Collect Basketball Player Movement
Data. e number of times each motion state can be correctly identified after 100 tests. For example, after collecting 100 pieces of WAL data on a smartphone and performing average filtering of 5-bit, 7-bit, and 9-bit filtering, the number of times correctly recognized by the neural network model is 95, 90, and 88 respectively. rough experimental comparison, it is found that the number of one-time filtering bits of mean filtering also has a certain influence on the experimental results. e experiment is shown in Figure 4.

Conclusions
e use of deep learning-based dynamic human action extraction and recognition technology for basketball players can realize the standardization and determination of athletes' movements, which is very helpful for future Chinese athletes to achieve better results on the playing field.
In the process of neural network model structure design and parameter tuning, it is necessary to use deep learning techniques to optimize, including regularization and discarding hidden layer units, and gradually adjust the neural network structure through continuous experiments to obtain a model with good performance.
On using the public human body motion dataset as the object, the data is preprocessed to construct a data format that meets TensorFlow requirements. e network structures of convolutional neural networks, classic neural networks, unidirectional recurrent neural networks, and bidirectional recurrent neural networks are compared.
In the process of constructing the neural network structure, the long-term and short-term memory neural network of the recurrent neural network and its variants are analyzed and compared, and the neural network model with better performance is obtained by continuously adjusting the neural network model and parameters. e design of the neural network model structure in this article relies on deep learning, but the angle of the experiment in this article is not comprehensive enough, and this model has not been put into the training of actual basketball players. With in-depth learning of the neural network, we can continue to optimize it.

Data Availability
No data were used to support this study.

Conflicts of Interest
e authors declare that they have no conflicts of interest. Mobile Information Systems 5