Research on Methods of Physical Aided Education Based on Deep Learning

In order to better meet the training needs of sports and improve the standardization of sports training, an openpose-based sports posture estimation method and assisted training system are proposed, combining the basic structure and principle of openpose network. Firstly, the human posture estimation algorithm is constructed by combining with the openpose network; secondly, the overall framework, specific operation process, image acquisition, posture estimation, and other modules of the sports assistance system are designed in detail; finally, the openpose posture estimation method constructed above is validated. The results show that the value of the loss function obtained by the algorithm gradually stabilizes after 250 iterations. By using the COCO dataset as the training base and comparing it with the standard posture, it is found that the algorithm can correctly identify different badminton action postures, and the recognition rate can reach up to 94%. This shows that the algorithm is feasible and can be used for posture estimation and training of badminton sports movements.


Related Work
With the development of people's livelihood, ordinary people pay more and more attention to personal health, and the discussion about physical health and sports in the society is becoming more and more heated. However, most people do not master the standard motion posture, so that the best motion e ect cannot be obtained, and even su er unnecessary injuries during exercise. erefore, it is necessary to process human movement recognition. In the past, people rely on assistant equipment to recognize human posture, so as to judge whether human movement is standard. With the mature of the machine learning algorithm and deep learning algorithm, researchers proposed diversi ed human movement recognition algorithms, including SVM classi ers, image processing, and deep neural networks. Furthermore, in human motion recognition technology, researchers also pioneered human body posture estimation technology, human motion recognition technology, and so on.
Human posture estimation has always been a popular research topic in academic research. For example, Amir Nadeem created the A-HPE method.
ere are four benchmark data sets, namely, signi cant pro le detection, entropy Markov model, multidimensional cues from whole body pro le, and robust body part model, used to detect human body parts. Its detection accuracy is signi cantly higher than that of traditional algorithms [1]. In addition, it can provide technical support for human-computer interaction. Poojitha Sing obtained data of various components of each parts of the human body by measuring the point cloud data of human posture in RGB images, which avoids the ambiguity of features and thus shows better detection performance [2]. Xinwei Li estimated the human joint moment by analyzing the dynamic human-computer interaction between the human elbow torque and the exoskeleton output [3]. Wei Quan et al. created an unsupervised learning algorithm based on a forward kinematics model of human skeleton, but the algorithm has not been tested. After the establishment of the human posture estimation algorithm, it is also necessary to introduce the integrated particle swarm optimization (PSO) for optimization. e advantage of the optimized algorithm is that no pretraining data is required, and the posture estimation of the human body is more concise [4]. After that, this method is tested by a series of experiments. Many scholars have studied human motion recognition. For example, Xiaojun Zhang created a human motion recognition technology based on deep learning. e LSTM algorithm is used to optimize deep learning algorithms, which requires advanced smart wearables devices [5]. Bi Zhuo created a multimodal deep neural network model based on the joint cost function, which used MSR Action3D data sets to identify human motion processes. And, the overall application performance is excellent [6]. Liu Shuqin proposed a human posture estimation method based on discrete point 3D reconstruction algorithm. In this method, the data features are extracted using principal component analysis, and then the estimation of human posture is achieved by means of two-dimensional posture prediction [7]. Jalal Ahmad et al. proposed a 3D Cartesian approach to feature extraction, by which the features are made to contain rich feature information [8]. Licciardo Gian Domenico and others then proposed a posture estimation method of FCN, and the results showed that the method obtained an average accuracy of 96.77% for 17 posture recognition [9]. Combined with the above research, the purpose of this study is to build an auxiliary system that can be used for badminton training and try to realize the estimation of posture through the matching of key bone points of human body, so as to better assist the movement training of badminton lovers. e contribution of this study lies in the extraction of sports posture through in-depth learning, and then through similarity comparison, the standardization standard of sports action is constructed, which provides more accurate information reference for sports training.

Estimation of Human Key Bone Points
Based on the Openpose Model Using the three consecutive 3×3 kernels to replace the 7×7 convolution kernel in the earlier output PAFs, which can not only ensure the receiving fields but also greatly reduce the amount of computation, so as to effectively improve the work efficiency of the network model. By referring to the DenseNet method, each output in the three convolution kernels is cascaded together. e network model can synchronously save high-level features and low-level features. e network structure of the openpose model is shown in Figure 1.
In the first stage, the convolutional neural network generates a set of partial affinity field. In the subsequent stages, the prediction results of the previous stage are cascaded with the original graph feature F. So, more accurate prediction can be made [10]. ∅ t represents the convolutional neural network at stage t ≤ T p , and T p represents the total PAF prediction stage.
After T p iterations, take the latest PAF prediction stage as the first stage and repeat this process to predict the confidence map [11]. Here, ρ t represents the convolutional neural network at stage T p and T p + T c , and T C represents the total prediction stage of the confidence map.

Estimation of Human Key Bone Points Based on Optimization Model Structure.
e openpose model is created to recognize and estimate multihuman postures. Its innovation is reflected in three aspects: firstly, the human body vector inclined field PAFs is established to estimate the confidence map of human limb features, which is among the constrained bone points in the human pose model. e constraint relationship is strengthened by combining the human key points hot spot map. And, the classification of multihuman posture key points is simplified. Secondly, six stage layers are created. e next stage layer will re-estimate the human key point hot spot map and the confidence map of human limb features, which output from the previous stage layer. And, the estimation accuracy can be further improved.
irdly, during the training process, the loss function of each stage is monitored to ensure that the overall loss is minimized. According to the test results, the openpose model has the advantages of high estimation accuracy, but also has the disadvantages of long estimation time.
For badminton, athletes' postures change quickly. e action frequency is higher than that of human body under normal circumstances. In order to track the change process of motion posture in real time, it is necessary to ensure the efficient operation of the estimation module and evaluation module in the human body posture evaluation system. Consider that the openpose model evaluates the head region according to the five bone points in the head, which have little impact on the body posture of badminton players. And, the estimated time is indirectly extended [12]. erefore, taking the modified human posture model as reference, this paper created a new deep neural network model to estimate the two-dimensional coordinates of human skeleton points of badminton players in a single frame image. Its architecture is shown in Figure 2 [13].
Firstly, the VGG neural network is introduced into the improved human posture evaluation model, and then the evaluation of human posture is realized through two-stage processing of openpose. e basis of the VGG neural network in Figure 2 is the CNN. Convolutional neural network is used to extract image features. CNN network includes convolutional layer, pooling layer, full connected layer, and output layer. e convolution operation is where k represents the convolution kernel; l represents the number of layers; M j represents the j th feature graph; and b represents the bias term.

2
Scientific Programming e calculation of the pooling layer is where down represents the lower sampling function and β and b represent the feature graph corresponding to each output, respectively. e training process of CNN includes forward propagation and back propagation.
Among them, forward propagation can calculate corresponding actual output results after layer by layer transformation by inputting data (X p , Y p ) to CNN, and the calculation formula of this process is as follows: Back propagation is the calculation of the error between the actual output O p and the target output Y p , and then the error is back-propagated according to the principle of error minimization, and its weight is constantly adjusted. e training process of CNN is as follows:

Back-Propagation Algorithm.
In the process of forward propagation, the squared error cost function is used to measure the error. If the category is class c and the number of training samples is N, then E N can be expressed as In formula (4), t n k and y n k represent the target output of n th sample and the k-dimension of actual output, respectively.
In the process of back propagation, the sensitivity of base is used to represent the error of back propagation, which represents the change rate of error to the base b, and the expression is as follows: In formula (5), since zu/zb 1, zE/zb zE/zu δ, which means that the sensitivity of a neuron's base b is equal to the derivative zE/zu of error E with its all input u.

Weight Update.
Weight update process of the convolutional layer: the calculation formula of weight update of this layer is the same as the calculation formula (1) of the convolution layer. e feature graph is input into a trainable convolution kernel for convolution operation, and a bias term is added. Finally, the output feature graph can be obtained through an activation function. M j represents the combination of input feature graphs. e corresponding convolution kernels of each output feature graph are di erent. Even though both output feature graph map j and output feature graph map k are obtained by convolution from input feature graph map i , their corresponding convolution kernels are still di erent.
If there is a downsampling layer l + 1 under each convolutional layer l, a pixel of the output feature map of the convolutional layer corresponds to the sensitivity D corresponding to one pixel in the downsampling layer. In order to e ectively calculate the sensitivity of the convolutional layer l, the sensitivity map corresponding to the upsampling in downsampling will be used to upsample, so that the size of map is the same as the feature map size of the convolutional layer l. In addition, the sensitivity δ l of the convolutional layer l can be obtained by multiplying the sensitivity by the parameter β. e calculation formula is Here, up represents the upsampling operation, and ∘ represents the multiplication of each element. If the sampling factor during downsampling is n, upsampling is to replicate each pixel in horizontal and vertical directions, respectively, so as to achieve the upsampling size recovery goal. Upsampling can be realized by Kronecker product: On this basis, its sensitivity map can be obtained according to a given feature graph on the convolution layer. Firstly, the gradient of base b is calculated, that is, the sensitivity of all elements in the sensitivity map is summed, and the formula is According to the weight sharing feature, the gradient solution is carried out for the point through all the connections associated with the weight, and then the gradient is obtained and summed. e expression is Here, (p l−1 i ) uv represents the block in x l−1 i convolved with k l ij , namely, a unit input of convolution layer l.  Formula (9) can be calculated by using the convolution function in MATLAB, and the following formula can be obtained: Here, rot180(δ l j ) means to rotate it. After rotation, crosscorrelation calculation can be carried out, and then the input is reversed.
Weight updating process of the downsampling layer: the weight updating process of downsampling layer is the same as the calculation formula (2) of the pooling layer. If the sensitivity map of the down sampling layer needs to be calculated, the updated values of parameters β and b can be calculated by using formula (8).
If the current downsampling layer is fully connected with the subsequent convolutional layer, the sensitivity map of te downsampling layer can be calculated by the BP algorithm, and the sensitivity can be calculated by back-propagation: Here, δ l+1 j represents the sensitivity reversely propagated to it by the next convolution layer of current downsampling layer, and rot180(k l+1 j ) represents the rotated convolution kernel. en, the gradient of bias β and b is computed. e gradient calculation method of bias b is to add all elements in sensitivity map, and the calculation formula is the same as calculation formula (8) of the convolutional layer.
For the gradient calculation of bias β, the downsampled map in the forward propagation process should be obtained, and the expression of downsampled map is us, the gradient of β can be calculated as rough the above construction, the VGg model network structure of this study is obtained in Figure 3.
rough the above image processing and then combined with the two-stage pose estimation in Figure 2, the pose of human motion is obtained.

Similarity Calculation.
e above improvement is about how to estimate human posture and ensure the real-time and accuracy of estimation. After estimating a group of reliable skeletal points which can be referenced to the modi ed human pose model, how to identify human body posture according to human posture skeletal points has become the key of the human posture evaluation system. Considering that badminton belongs to the upper limb movement, the standard posture in various badminton sports is concentrated in the upper limb area. erefore, on the basis of human skeletal point estimation, similarity is used to evaluate the similarity between the posture of badminton lovers and the standard badminton action library, so as to realize the objective evaluation of badminton action. e human posture evaluation process consists of three steps: (1) convert the coordinates of the input bone points; (2) match with the standard posture library; (3) process and output the matching results.
A set of human bone point coordinates of a single frame image in the camera coordinate system is input into the human posture evaluation system, and the modi ed human posture model is used as the reference. In the human posture coordinate set index_T, each coordinate group has 13 points, as shown in Figure 4.
In the evaluation stage, the input coordinate system (image pixel coordinate system) should be transformed rst, so as to prepare for the subsequent posture evaluation. In the coordinate conversion process, the camera's internal parameter matrix and external parameter matrix will be involved. Here, the former is xed, and the latter depends on the location and angle of the camera lens. erefore, in this regard, the camera's external matrix needs to be precalibrated  to ensure the validity of the external matrix. Although the above process is feasible, it is di cult to operate in practice. So, a new coordinate transformation method is proposed in this paper, that is, (a) convert from an image pixel coordinate system to a rectangular coordinate system with the neck point as the origin in the human bone point; (b) transform from the rectangular coordinate system with the neck point as the origin to the polar coordinate system with the neck point as the origin and determine the angle between the other 12 points in the polar coordinate system and the positive x axis. e coordinate transformation step (a) solves the matching problem of human posture and standard posture caused by di erent positions, and step (b) solves the uncertainty of human posture evaluation caused by individual body size di erence.
After the coordinate transformation is completed, 12 included angle values of the positive X-axis and the vector [0, 11] are obtained, respectively, as shown in Figure 5 [14]. e calculation process of coordinate transformation is as follows:      (iii) Apply formula (14) to solve the included angle value between each vector in the vector set and the positive X-axis and establish the included angle set θ 0 , θ 1 , ..., θ 11 [15]: where θ represents the cosine angle.

Matching with the Standard
Posture. e human posture evaluation algorithm divides the human body into four regions, as shown in Figure 6 [16].
Coordinate transformation is performed for the other points in the human posture model relative to the neck points to adjust the corresponding serial number. e right region of the upper limb is composed of the right elbow, right shoulder, and right wrist. e coordinate serial number after adjustment is 0, 1, and 2. e left region of the upper limb is composed of left elbow, left shoulder, and left wrist. e coordinate serial number after adjustment is 3, 4, and 5. e right region of lower limbs is composed of right knee, right hip, and right ankle, and the coordinate serial number after adjustment is 6, 7, and 8. e left region of the lower limbs is composed of the left knee, left hip, and left ankle. e coordinates after adjustment are 9, 10, and 11. e posture evaluation of each small area in all the regions is to compare the posture to be evaluated with the candidate standard posture of the previous stage. And, the accumulative error is calculated. If the accumulative error does not exceed the allowable error of the stage, the standard posture is included in the candidate standard posture set.

Human Posture Assessment Process.
Combined with the above analysis, the evaluation of human posture is mainly divided into the following steps: For the right upper limb area (including the right shoulder, the right elbow, and the right wrist), the three vectors between bones and neck, as well as the angle of the positive x axis can be solved, respectively. So, the right upper limb regional similarity sets can be established. en, the absolute values of similarity degree with corresponding standard attitude are solved, respectively. Finally, the similar standard postures are ltered with the predetermined error values.
For the right lower limb region (including right hip, right knee, and right ankle), the three vectors between the skeleton and the neck, as well as the angle of the positive X axis, are solved, respectively, to establish the regional similarity set of the right lower limb. en, the absolute values of similarity degree with corresponding standard attitude are solved, respectively. Combining with the predetermined error value, the similar standard postures can be ltered.
For the left upper limb area (including the left shoulder, the left elbow, and the right wrist), the three vectors between bones and neck as well as the angle of the positive x axis can be solved, respectively, so as to set up the similarity sets of the left lower limb region. en, the absolute values of the similarity degree with the corresponding standard attitude  are solved, respectively. Finally, the similar standard postures are ltered with the predetermined error values. After the above screening process is completed, the human obtained standard posture is the evaluation result. And, in this process, the cumulatively determined similarity is the evaluation value. e above process is shown in Figure 7 [17,18]. To determine the similarity di erence in the above matching process, it can be weighted according to the inuence degree of di erent regions on human posture. But this may lead to coupling problem, which means that two di erent representative values of human posture tend to be consistent after the completion of weighting. erefore, this paper nally decided to directly output the bone point, evaluation value, and matching standard posture serial number at the end of the matching [19].
Considering that when badminton players hold the racquet with their right hand, their left hand is mainly used for coordination to maintain balance. erefore, the algorithm in this paper cancels the matching of the left upper limb region. Meanwhile, the weight of the other three regions is optimized. Speci cally, the weight of the right upper limb region is the largest, the right lower limb and the left lower limb region are second. Figure 8 shows the design idea of human posture evaluation algorithm [20].

Matching Result Processing.
rough the human posture evaluation, the matched standard human posture serial number and the similarity with the matched standard human posture can be obtained. e processing procedure of matching results is as follows: If the output is "−1", "0," and "1", it means that the left area of upper limb "fails to match," which is the key area of badminton player's posture matching. erefore, it can be judged that the athlete's posture in this frame image does not conform to any posture in the standard library, which means that the athlete's posture in this frame image is not standard.
If other information is output, the matching result is obtained, and the higher output standard human posture serial number is, the better the matching result is.

Method and System Verification
To verify the above method and system, this study attempts to build a badminton posture evaluation system to verify the above methods.

Data Sources and Training.
To verify the above method and system, part of the video image data is selected as the basic data set for verification. Image data are obtained from badminton video, and images collected by camera and human images in COCO data set are used as training data set. e settings of training parameters are listed in Table 1.
Images in the COCO data set are equipped with human limb grayscale images and human bone point grayscale images. e image collected by the camera can become a suitable training data set only after a series of processing. e process is as follows: (1) Normalize the collected image to ensure that the pixel value of the image is in the range of [−0.5,0.5] (2) Mark the pixel value of each human limbs as 0.5 and save it as the human limbs grayscale (3) Mark the pixel value of each human bone point as 0. 5 and save it as the human bone point grayscale In the first training, the model is trained with COCO data set, which ensured that the optimized model can accurately estimate the general human posture. In the subsequent training process, there is no need to use the initial weight, only need to read the weight parameters of the first training. And, the collected images are adopted to carry out training, so as to further improve the evaluation accuracy.
Only reasonably setting the basic learning rate can effectively prevent the problem of excessive learning rate. erefore, the basic learning rate set in this paper is equal to 5e-5.

Loss Function Curve.
After the first training based on COCO data set is completed, the collected images are used for subsequent training. e loss situation after training is shown in Figure 9.
It can be seen that, in the course of multiple training, the loss keeps decreasing trend as a whole. And, the gradient descent tends to be gentle, which finally approaches the optimal solution.

Accuracy and Timeliness of Skeletal Keypoint
Estimation.
e estimation accuracy of traditional openpose model and structure-optimized openpose model for each skeletal point is statistically analyzed, and the specific data are shown in Figure 10.
It can be seen from the figure that the estimation accuracy of optimization model is slightly lower than that of the openpose model, and the estimation accuracy of each skeletal point in the left limb is lower than that in the right limb.

Application Verification.
To further verify the feasibility of the above algorithm, an experimental system is set up for verification.

Overall Architecture.
e human posture evaluation system consists of camera acquisition module, human posture evaluation module, and prediction model module. e output result of the system is the matching result and matching loss of human posture and standard posture library in the current frame image. e matching result refers to the highest standard posture with the human posture matching degree in the current frame image, and the matching loss indicates the similarity between the human posture and the standard posture. e overall framework of the human posture evaluation system is shown in Figure 11 [21].

System Operation Process.
e operation mechanism of human posture evaluation system is shown in Figure 12 [22].

Camera Acquisition
Module. Camera acquisition module includes two parts, namely, hardware parameter and software interface. Among them, the key of hardware parameters is to correctly set the placement angle of the camera and reasonably determine the camera parameters. Combined with the above analysis results, the camera should be placed on the left side of the badminton net and on the right side of the badminton player. In addition, the best height is 1.2 m. e relevant parameters of the camera are listed in Table 2.
e key of the software interface is to make use of the camera interface layer to make the driver compatible, as shown in Figure 13 [23,24].
ICmera, the base class of this module, stores one worker function and four detection functions.
Above all, the number and ID of cameras used in the human posture assessment system are determined, and the initial deployment is completed. en, according to the site environment and the requirements for the evaluation, the camera resolution, frame rate, and other parameters are debugged. erefore, the module sets up two function interfaces, namely, showParam and setParam. Finally, the function work is used to eliminate invalid information in the image information collected, such as resolution, width, height, and so on. e collected image is converted into a unified cv: Mat format. e base class ICmera is used for compatibility of driver modules of other cameras, and the subsequent evaluation process adopts the form of ICmera. It can be seen that the human posture evaluation system is not sensitive to the camera model. e camera parameters must meet the setting requirements so that the driver can be set by inheriting the base class. If the evaluation is not effective, the function of ICmera can be called to debug the current camera parameters.

Effect Display.
Bone point hot spot map: the bone point hot spot map output is shown in Figure 14. e evaluation effect achieved by the human posture evaluation system in the test stage is shown in Figure 15.
Effect display: the human posture evaluation algorithm proposed in this paper is used to match successive single frame images. e frames 138 to 139 are successfully matched to the standard posture. e evaluation effect of these 8 frames is shown in Figure 16.
e analysis of Figure 15 shows that first, "Frame i: matching failure," which means that the image in frame I failed to match the standard posture. Second, "Frame i: ending stage A, matching standard posture serial number B, matching loss X," which indicates that the serial number of bone point at the exit of frame i matching is A. It successfully matched with standard posture serial number B, and the loss value of the two is X.

Detection Rate.
e human posture evaluation system is used to evaluate the posture of 6 videos. e number of the frames to be tested and the measured frames in each video are shown in Table 3.

Conclusion
To sum up, through the above design, the application of the openpose neural network in the actual sports is realized, so as to provide a new reference method for the accurate training of sports. e innovation of this paper is the accuracy improvement of attitude estimation. At the same time, through the collection of badminton movements, the real-time estimation of badminton posture movements is realized, which provides a reference way for the application of this method.

Data Availability
e experimental data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest regarding this work.

References
[1] A. Nadeem, J. Ahmad, and K. Kim, "Automatic Human Posture Estimation for Sport Activity Recognition with