Basketball Motion Posture Recognition Based on Recurrent Deep Learning Model

In order to improve the training eect of athletes and eectively identify the movement posture of basketball players, we propose a basketball motion posture recognition method based on recurrent deep learning. A one-dimensional convolution layer is added to the neural network structure of the deep recurrent Q network (DRQN) to extract the athlete pose feature data before the long short-term memory (LSTM) layer. e acceleration and angular velocity data of athletes are collected by inertial sensors, and the multi-dimensional motion posture features are extracted from the time domain and frequency domain, respectively, and the posture recognition of basketball is realized by DRQN. Finally, the new reinforcement learning algorithm is trained and tested in a time-series-related environment. e experimental results show that the method can eectively recognize the basketball motion posture, and the average accuracy of posture recognition reaches 99.3%.


Introduction
In the process of basketball training and competition, coaches need to formulate corresponding training plans according to the individual conditions of di erent players to improve the players' basketball skills. e traditional training method is that coaches formulate training plans based on their own training theory and training experience, combined with the skill level of basketball players [1]. is training mode is highly subjective, and coaches need to spend a lot of time analyzing the posture of athletes, and it is di cult to objectively evaluate the training e ect of athletes [2]. e core of modern physical training is precision and e ciency. If the coach can accurately control the movement posture of the athlete, the training e ect can be greatly improved. erefore, collecting and analyzing the posture data of basketball players and accurately identifying the movement posture has signi cant signi cance for improving the scienti city of the coaches' training plan and improving the training e ect of the athletes, which is a new research direction [3].
With the rapid development of computer computing power, it has become possible to introduce deep learning into reinforcement learning to solve continuous state space problems. In 2015, the deep Q-network (DQN) proposed by Mnih and colleagues solved the instability problem by employing experience replay and target network techniques, reaching the level of human players on more than 2,600 Atari games, bringing depth. Since then, various improvements to DQN have emerged. Reference [4] proposes priority experience replay, which allows important experience to be used more frequently, thereby improving the e ciency of reinforcement learning. e deep double-Q network proposed by [5] in 2016 solves the problem of overestimation. In the same year, literature [6] added a competitive structure to DQN, which improved the learning e ciency of DQN. is DQN with a competitive structure is called a competitive deep Q-network [7].
DQN and its derived reinforcement learning algorithms have already been regarded as powerful algorithms, and in many areas, such as simple 2D games, the performance is beyond the level of ordinary people. However, this excellent performance often only stays in the environment of artificially specified rules, such as most chess and games and other fields. DQN still has problems that are difficult to implement in real-world problems. is is own to in the past research on reinforcement learning algorithms, we usually default to the state of the environment that we can fully obtain. But in the real world, we obviously don't have the God's perspective like in chess and games, and our acquisition of the state of the environment is obtained through observation. However, there will inevitably be information errors or even loss in observation, which makes it impossible to obtain a complete state through observation. At this time, the performance of the DQN based on the Markov decision process will naturally be greatly affected.
In order to solve the above problems, reference [8] proposed a deep recurrent Q network (DRQN), and on the basis of DQN, the first fully connected layer was changed to a long short-term memory (LSTM) layer of the same size, which solved the problem of reality environmental part observation problem. To solve the contradiction between reinforcement learning and feedback neural network parameter update, Matthew Hausknecht et al. proposed two matching parameter update methods: sequential bootstrap update and random bootstrap update. In the partially observed Markov environment, DRQN has a significant improvement over DQN.
Basketball motion posture recognition is a kind of human gesture recognition. At present, the methods of human posture recognition mainly include two categories: posture recognition based on inertial sensors and posture recognition based on image acquisition. Posture recognition based on image acquisition can be divided into monocular video recognition and multi-eye video recognition according to the number of image acquisition devices. e general idea of image capture gesture recognition is to first use the camera to capture the image or video of the athlete, and then extract the motion features hidden in the image and video. Finally, a classifier is designed to recognize the sports posture of athletes [9][10][11][12][13]. e image acquisition posture recognition technology has a relatively high maturity, and the accuracy of posture recognition is also very high. However, the defects of this type of method are that there are dead spots in video surveillance, a large amount of equipment, and a heavy data processing burden, which is not conducive to popularization and application [14]. e basic idea of inertial sensor recognition is that the athlete wears a simple and lightweight data acquisition sensor, sends the collected data to the processing terminal in real time, and recognizes the athlete's posture according to various posture data [15]. is kind of method can make up for the shortcomings of image acquisition and recognition, has low requirements on the use environment and high recognition efficiency, and has become a hot method in basketball posture recognition research.
Based on these studies, we propose a basketball motion posture recognition method based on inertial sensors and DRQN. First, a data acquisition module of basketball motion posture based on inertial sensor is designed, and the features for basketball motion posture recognition are extracted from time domain and frequency domain, respectively. en, we build a posture recognition model for basketball players based on one-dimensional convolutional layers and DRQN. Finally, we conduct experiments and evaluations on the model, and the experimental results verify the effectiveness and accuracy of the method.

Deep Recurrent Q Network.
In a real environment, it is often difficult for an agent to obtain a complete state. In other words, real-world environments usually do not strictly conform to Markov properties [8]. Partially Observable Markov Decision Processes (POMDPs) mathematically model the connection between observations and the true state. erefore, it can better describe the dynamics of the real environment [16]. POMDP introduces observation space Ω and conditional observation probability function O on the basis of Markov decision process (MDP), and defines the agent's primary perception of the environment as observation o ∈ Ω. ere is a certain connection between the observation and the real state, and this connection is described by probability, that is, o∼O(s). In this way, the POMDP can be described by six parameters (S, A, P, R, Ω, O), which represent the state space, action space, state transition probability function, reward function, and the newly added observation space Ω relative to MDP, and the conditional observation probability function O. Obviously, when the observation o corresponds to the state s one-toone, the POMDP becomes the MDP. e DRQN proposed by Matthew Hausknecht and Peter Stone in 2017 modified the network structure of DQN by changing its first fully connected layer to an LSTM layer of the same size. Because of the introduction of memory capabilities, neural networks are better able to combat incomplete information due to observations. e neural network structure of DRQN is shown in Figure 1.

Input and Output Structure.
e hardware structure of data acquisition is shown in Figure 2. e functions of the hardware part include data acquisition and data transmission, including four data acquisition nodes and one data transmission base station. e data acquisition node consists of a three-axis gyroscope MPU3050 three-axis accelerometer and a magnetometer LSM303DLH, which, respectively, collect the angular velocity and acceleration data of the human body e core component of the data sending base station is the wireless transceiver nRF24L01, which receives the data collected by the node and sends the data to the data terminal through the wireless network. e core processing function of the data acquisition module is completed by the 32-bit ARM microcontroller STM32F103. e energy supply of the data acquisition module is responsible for a 3.7 V lithium-ion battery. e signal transmission of the entire data acquisition module includes two parts: First, the sensor node sends the collected human body posture data to the data transmission base station. e second is that the data transmission base station sends data to the signal transmission between the processing terminal sensor node and the data transmission base station, which is realized based on the wireless sensor network.
e problem that needs to be overcome is to minimize the data collision rate, reduce data loss, and improve the accuracy of data collection. e signal transmission between the data transmission base station and the processing terminal is realized based on the star topology network, and the time division multiplexing protocol is adopted. e problem that needs attention is to calibrate the clock deviation between di erent nodes and keep the time uniform.
In data collection, multiple sensor nodes are generally used to collect relevant information. In addition to the structure of the node itself, the e ective and complete transmission of data is another major problem. At present, according to di erent data transmission media, data transmission forms are mainly wired and wireless. e wired transmission mode is more stable and reliable, but it has not been widely used because of its complex installation and wiring and many restrictions on motion detection.
ere are many advantages in the eld of human body posture recognition; often using wireless communication technology has Bluetooth, Zig Bee, wireless radio frequency identi cation, wireless transmission mode can reduce the in uence of the sensors on the normal activity, so most systems adopt the form of data transmission. In the design of wireless transfer protocol, it forms the network architecture. Among the common network topology, star topology and mesh topology are widely used in practical applications. In the application of body area network, star topology requires multiple nodes to be directly connected to the receiving node, so it is often used because of its simple communication structure and convenient implementation.
Compared with star topology structure, network topology structure is more complex, but it can be used in a multiple way to reduce the path loss caused by di raction, and the data transmission is only between adjacent nodes, can envoys point to keep the smaller energy transmission network protocol setting need according to their own research needs in setting reasonable network structure.

Feature Extraction of Basketball Motion Posture.
e basketball player posture acquisition data that mainly includes acceleration information and angular velocity information, respectively, uses a x n , a y n , a z n to represent the acceleration of the x, y, z axes of the n-th sampling point. g x n , g y n , g z n represent the angular velocity of the x, y, and z axes of the n-th sampling point. e vector sum of the acceleration at the n-th point is a n a x n + a y n + a z n .
Similarly, the vector sum of the angular velocity at the nth point is g n g x n + g y n + g z n . (2) Combine the three acceleration vectors, three angular velocity vectors input by the data acquisition module, and the vector sum of the acceleration and angular velocity calculated by equations (1) and (2) e variance of each sampling point is e extracted time-domain features include four dimensions of the acceleration sensor x, y, and z axis and the Mathematical Problems in Engineering mean value of the acceleration vector sum; four dimensions of the angular velocity sensor x, y, and z axis and the mean value of the angular velocity vector sum; four dimensions of the acceleration sensor x, y, and z axis and the variance of the acceleration vector sum and the angular velocity sensor. e variance of the x, y, and z axis and the angular velocity vector sum has a total of four dimensions, and a total of 16-dimensions of time-domain attitude parameters. Next, based on the Fourier transform principle, the timedomain acquisition data is transformed into the frequency domain, and the formula is where S(n) represents the nth adoption point value in the frequency domain. e frequency-domain feature for basketball player posture recognition is the peak value of the Fourier transform, that is, where K is the number of sampling points in the frequency domain and f is the frequency used by the data acquisition sensor. e extracted frequency-domain features include the frequency-domain peak value of the acceleration sensor x, y, and z axis and the six-dimensional frequency value corresponding to the peak value, the acceleration vector and the two-dimensional frequency value corresponding to the frequency-domain peak value and peak value, and the angular velocity sensor. e frequency-domain peak value of the x, y, and z axis and the corresponding frequency value of the peak value are six dimensional. e angular velocity vector and the frequency-domain peak value and peak value corresponding to the frequency value are two dimensional, with a total of 16-dimensional features.

Model Establishment.
Feature selection is a variable selection method, also known as attribute selection or variable subset selection, which is a process of selecting a subset of relevant attributes in order to build a classification model. e primary reason for feature selection is that in the feature set obtained by feature extraction, not all attributes are relevant and useful, and the selection of some attributes may be redundant. e introduction of those irrelevant attributes not only has no effect on the construction of the model, but also makes the constructed model more complex due to the redundancy and irrelevance of the data. erefore, it is extremely necessary to conduct reasonable feature screening. ere is a big difference between feature selection and feature extraction. e purpose of feature extraction is to extract feature vectors from the original data, while feature selection is to select a suitable subset of feature vectors from these feature vectors.
ere are three main purposes of feature selection: (1) simplify the model and reduce the computational complexity; (2) shorten the training time; and (3) strengthen the promotion to avoid the problem of overfitting. Commonly used feature selection algorithms are generally obtained by combining evaluation functions with algorithms such as sequential forward/ backward search, decision tree, best-first search, and genetic algorithm. Among them, the evaluation algorithm is a function that can reflect the pros and cons of the selected feature subset, and can be used to solve the correlation between features and classification, classifier error rate, etc. In addition, the commonly used methods for feature selection to reduce the feature dimension and reduce the amount of system computation are: linear discriminant analysis (LDA), principal component analysis (PCA), and other algorithms [17]. e recognition of basketball motion posture is to construct a classifier that can recognize the athlete's posture according to the features of the data collected by the sensor. e extracted pose features are input into the classification, and the classifier outputs a specific basketball action. After feature extraction, a 16-dimensional feature parameter set for identifying the posture of basketball players is obtained. However, some of these feature parameters are features that are not related to the basketball player's posture, or have low correlation.
ere are also some features that represent redundant information. If these features are input into the classifier at the same time, it will not only reduce the recognition performance of the classifier, but also seriously affect the recognition efficiency of the classifier. erefore, it is necessary to select features before performing basketball pose recognition. e purpose of feature selection is to reduce dimensionality in the data. At the same time, the feature parameters that are highly relevant to the posture recognition of basketball players are screened out. After experimental testing, the PCA method was selected to realize the selection of characteristic parameters.
In the 16-dimensional feature, the optimal feature is selected based on the PCA method. Next, the classifier is constructed to recognize the posture of the basketball player. Both DQN and DRQN neural networks contain twodimensional convolutional layers. Typically, if the input is not an image, but just a feature vector, the neural network used by DQN and DRQN will not contain convolutional layers. However, the feature extraction capability of convolutional layers can be applied not only to extract image features, but also to extract features in the temporal dimension [18]. erefore, this model utilizes the temporal dimension feature extraction capability of a 1-dimensional convolutional layer to extract temporal features of athlete poses. e network structure of the proposed model is shown in Figure 3. On the basis of the neural network used in DRQN, a one-dimensional convolutional layer is added, which is called a one-dimensional convolutional recurrent neural network. e one-dimensional convolutional layer will convolve the input data in the time dimension and extract its features in the time dimension. Experiments show that this can improve the feature extraction ability and fitting ability of the neural network, thereby improving the decisionmaking level of the agent, and making the agent perform better in the environment related to time series.

e Recognition of Basketball Motion Postures.
To solve the convergence problem of deep reinforcement learning in the environment with large state-space dimension, this study uses a one-dimensional convolutional layer to extract the features of the state in the time dimension. Let the input be X ∈ R N×C in ×L in and the output be Y ∈ R N×C out ×L out , then the mathematical expression of the one-dimensional convolutional layer is In (7), the symbol ⊗ is the cross-correlation operation. N is the size of a batch of training data. C in and C out are the number of channels of input and output data, respectively. L in and L out are the lengths of input and output data, respectively. A kernel_size represents the one-dimensional convolution kernel size. α ∈ R C out ×C in ×kernel size is the onedimensional convolution kernel of this layer. β ∈ R C out is the bias term of this layer. e LSTM layer is a recurrent neural network that brings memory capabilities to the neural network. Generally, the input of the LSTM layer is a time series x of a certain feature vector x ∈ R N×L in ×H in . For simplicity, assume that a batch contains only one piece of data and the feature vector contains only one feature, that is, x ∈ R L in . It can be seen that x [x 1 , x 2 , . . . , x t , . . . , x L in ] T , then for the element x t at any time in x, the mathematical expression of the LSTM layer is In (8)  gates, forget gates, cell gates, and output gates at time t, respectively. c t and h t are called time t, which denote cell states and hidden states, respectively. e fully connected layer is the most classic component of the neural network. According to the classical form, let the input of the fully connected layer be the feature vector X ∈ R N×H in and the output be Y ∈ R N×H out , then the mathematical expression of the fully connected layer is where σ is a nonlinear activation function, commonly used are sigmoid function and ReLU function. N is the size of a batch of training data. H in and H out are the number of features of the input and output data, respectively. A is the weight of the layer, and b is the bias term of the layer. As seen in Figure 4, the detail neural network structure framework of our scheme is given.

Experimental Results and Evaluation
In the experimental process, a total of 100 basketball players were collected in four postures: shooting, passing, dribbling, and catching. About 100 sets of data were collected for each posture, and 40,000 sample data were obtained. ese data were iterated 100 times in the above model. In the process of collecting basketball motion posture data, the testers completed the prescribed basketball movements according to the preset posture and according to their usual exercise habits. According to the characteristics of the body movements of the athletes, a classi er is constructed to identify the posture of the basketball players. Four classical classi er, that is, random forest, support vector machine, SOM neural network, and Bayesian network, as a comparison with our model, is veri ed with the validation set of basketball motion pose. e comparison of experimental results is shown in Table 1.
e experimental results show that for the recognition of basketball poses, our model has the highest average recognition accuracy, reaching 99.3%. Among other models, the SVM algorithm has the highest recognition accuracy, with an average recognition accuracy of 97.1%. e average recognition accuracies of SOM neural network, Bayesian network, and random forest are 91.5%, 90.8%, and 89.4%, respectively. is result veri es the accuracy of the proposed basketball motion recognition method based on multi-feature fusion and DRQN.
is is because compared with these traditional machine learning algorithms, DRQN has a deeper network structure and extract more in-depth features of basketball poses, thereby improving the accuracy of recognition. Figure 5 shows the distribution of the output results of the model after the rst epoch and the 100-th epoch. e red dots are the characteristic distribution of the samples, and

Conclusion
In recent years, with the development of wireless sensor networks, and microelectronics equipment technology, the human body gesture recognition has attracted extensive attention in various fields, such as health sports game movie. Based on posture recognition based on the human body, posture recognition of the movement of athletes in the field of basketball was studied and analysis. In this study, the problem of basketball pose recognition is studied, and a new basketball posture recognition method based on DRQN is proposed. Inertial sensors are used to collect athletes' posture data. After the features are extracted, PCA is used to reduce the dimensionality of the features. e introduction of the LSTM layer enables our model to have a certain memory capacity. e addition of a one-dimensional convolutional layer gives our model a stronger feature extraction ability on the basis of its memory ability, and then it can process information in the time dimension more efficiently. At the same time, the one-dimensional convolutional layer also increases the fitting ability and stability of the neural network, making the training process of deep reinforcement learning more stable. After 100-th epoch, the accuracy of recognizing basketball poses is significantly improved. Compared with other methods for recognizing basketball motion posture, our model has better performance.
Data Availability e raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

Conflicts of Interest
e authors declare that they have no conflicts of interest regarding this work.