Research on Aerobics Training and Evaluation Method Based on Artificial Intelligence-Aided Modeling

Traditional aerobics training methods have the problems of lack of auxiliary teaching conditions and low-training efficiency. With the in-depth application of artificial intelligence and computer-aided training methods in the field of aerobics teaching and practice, this paper proposes a local space-time preserving Fisher vector (FV) coding method and monocular motion video automatic scoring technology. Firstly, the gradient direction histogram and optical flow histogram are extracted to describe the motion posture and motion characteristics of the human body in motion video. After normalization and data dimensionality reduction based on the principal component analysis, the human motion feature vector with discrimination ability is obtained. )en, the spatiotemporal pyramidmethod is used to embed spatiotemporal features in FV coding to improve the ability to identify the correctness and coordination of human behavior. Finally, the linear model of different action classifications is established to determine the action score. In the key frame extraction experiment of the aerobics action video, the ST-FMP model improves the recognition accuracy of uncertain human parts in the flexible hybrid joint human model by about 15 percentage points, and the key frame extraction accuracy reaches 81%, which is better than the traditional algorithm. )is algorithm is not only sensitive to human motion characteristics and human posture but also suitable for sports video annotation evaluation, which has a certain reference significance for improving the level of aerobics training.


Introduction
Aerobics aims to show athletes' ability to perfectly complete difficult movements and continue complex and high-intensity sports under the accompaniment of music. How to carry out scientific training is the basis and key link of excellent aerobics, which has become an important scientific research topic [1]. Traditional aerobics training methods are based on the analysis of previous competitions and training cases, they lack auxiliary teaching and real-time analysis, and the training efficiency is low. Artificial intelligence and computer-aided system is a computer science and technology, including robot, language, image recognition, and expert system [2]. In recent years, artificial intelligence technology has been widely used in aerobics teaching and practice. is is of great significance to improve the training quality of aerobics athletes and help coaches formulate reasonable sports plans. Some teachers' traditional training ideas are deeply rooted and difficult to accept the new training mode. In the concept of many aerobics teachers, aerobics training does not need the aid of multimedia teaching means. It only needs to demonstrate aerobics movements and then let students practice [3]. is traditional teaching concept mainly exists in some older aerobics teachers. Because of the deep-rooted teaching concept and poor adaptability, it is difficult to accept new ideas and keep pace with the times. ere are loopholes in the Computer-Aided Instruction (CAI) system. Although a few schools in our country have begun to use the computer-aided teaching system to carry out aerobics teaching activities [4], but due to the limited scope of application, there are still some loopholes in the system, such as the lack of teaching feedback module in the established computer-aided teaching system, so there are still some problems in the information communication with students, which cannot let students learn well. It is difficult to understand students' views and opinions on aerobics teaching. Some teachers have not fully played a guiding role in the dependence of the computer-aided instruction system. Since the CAI system is a kind of auxiliary means, the teaching theme should be the interactive process between teachers and students. However, with the help of the computer-aided instruction system, many teachers have developed a kind of dependence on it. e aerobics teaching mode has changed from the previous demonstration mode to the video demonstration mode and the students' self-practice mode.
is dependence on the computer-aided instruction system not only fails to achieve the purpose of further promoting the effect of students' aerobics practice but also causes students to have confused psychology under the condition of teachers' more relaxed supervision, which will affect the teaching quality [5].
erefore, although the computer-aided instruction system helps teachers to share a lot of work, the teacher's teaching guidance function is still irreplaceable. In the actual training process, teachers need to master the computeraided instruction system in the teaching process of the component to avoid making it become the dominant aerobics teaching.
From the problems existing in the practice of intelligent computer-aided aerobics training, it is worth in-depth research on how to carry out aerobics training based on intelligent auxiliary software. erefore, starting from the design of the computer-aided aerobics training practice system, aiming at the problems existing in the intelligent computer-aided aerobics training application system, this paper puts forward the aerobics feature extraction method under intelligent computer-aided modeling and gives the aerobics evaluation scheme. Firstly, in order to maintain the spatiotemporal continuity of the FMP model in motion video human body pose estimation, the ST-FMP model with spatiotemporal continuity is obtained by establishing time continuity constraints between the vertex pairs of human parts in adjacent frames, and then, the ST-FMP model is simplified by embedding spatiotemporal constraints between uncertain human parts in adjacent video frames. e N-best optimization algorithm is used to estimate the human posture parameters, which improves the efficiency of solving the human posture parameters in a motion video. en, the relative position characteristics and motion direction of each part of the human body are used to describe the motion characteristics of each part of the human body, and the Laplace scoring algorithm is used for feature selection to form a locally discriminated human motion feature vector. Finally, the key frames of the motion video are determined by the ISODATA dynamic clustering algorithm. Finally, the effectiveness of intelligent assistant software in aerobics training is analyzed through experiments.

Related Works
An intelligent computer-aided aerobics training application system is the basis of realizing intelligent-assisted training. When constructing the application system of intelligent computer-aided aerobics training, we need to determine the framework of the system first. e framework of the intelligent computer-aided aerobics training system is shown in Figure 1. e determination of the frame structure is the first step to establish the intelligent computer-aided instruction system, which needs to be carried out under the premise of making clear the purpose and task of aerobics training. en, combined with the focus of the system, the overall framework is designed on the basis of the syllabus. In order to make aerobics teaching activities more effective, the design of the overall framework structure should be concise, intuitive, clear, and interrelated [6]. Combined with aerobics training materials, the design of the computer-aided teaching system is divided into four main modules: basic theory of action module, training site and evaluation principle module, action training and difficult motion playback module, and assessment and evaluation module. e basic theory module of action mainly includes the common research methods of aerobics technical movements, sports teaching theory, training methods, and other related units. e teaching site and evaluation principle module mainly includes aerobics formation layout, clothing specifications and related decoration, judgment standard, referee's responsibility, analysis of common judgment errors of aerobics referee, and troubleshooting unit in the form of human-computer interaction. e motion teaching and key and difficult point motion playback module mainly includes the technical action analysis unit, essential analysis unit, basic teaching method unit, error prone action analysis unit, standard action demonstration unit, key and difficult motion analysis unit, and action video decomposition teaching unit.
e CAI system contains rich teaching resources, including text, animation, video, sound, and pictures. erefore, in the process of establishing the computer-aided instruction system, the collection of relevant materials is very important. In order to enrich the teaching resources of the CAI system as much as possible, in the preparation stage of materials, we need to follow the following procedures: (i) Ask experienced aerobics teachers to determine the scope of CAI materials and select qualified materials according to specific teaching objectives and tasks. After determining the material, you can write the design text, record the sound, and choose the appropriate background music. (ii) Shooting video teaching resources, including standard action video and wrong action video, for students to learn and learn from. (iii) Take a video of referees in aerobics competitions.
Using a JVC digital camera, the organization, division of labor, and gesture process of aerobics competition referees are completely photographed. Finally, with the help of 3D animation technology and image processing software, the materials are assembled into teaching resources with teaching value and inserted into the corresponding plate after the completion of the computer-aided instruction system.

Scientific Programming
Finally, a computer-aided instruction system can be established. Authorware 6.0 is a kind of multimedia visualization development work, which is based on the flowchart, so it can be selected as the operation platform of aerobics calculation and auxiliary teaching system [7]. Based on the Authorware 6.0 development platform, the calisthenics intelligent computer-aided teaching system can easily realize the classification of aerobics teaching modules and store the corresponding teaching resources. en, the navigation icon is used to convert different modules, the framework is used to turn pages of different teaching contents, the scroll function of the text is used to realize the continuous reading of a long text, and the "hot area response" and "button response" in the interactive icon help students to convert between image, text, and video more easily and use teaching resources conveniently.
No matter students or teachers, as long as they click the corresponding module, they can get the required teaching resources on the corresponding window and easily realize the jump between different modules. At the same time, the human-computer interaction mode can reflect the problems existing in the process of students' aerobics learning more carefully and then continuously improve, greatly improving the effect of aerobics teaching. In addition, the real-time analysis and assessment mode in the comprehensive evaluation module can also exercise the students' observation ability, analysis ability, and comprehensive control ability, improve the interest of aerobics teaching, attract students to explore constantly, and then make progress in the process of exploration. e establishment of the intelligent computeraided instruction system overcomes the problems of monotonous teaching mode and low-teaching effect in the past, stimulates students' learning enthusiasm, and makes the teaching quality achieve a qualitative leap.

Human Pose Estimation Model with Spatiotemporal
Features. With the application of intelligent computers aided in aerobics training, it is urgent to develop an adaptive key frame extraction technology which can accurately reflect the characteristics of human body movements. Considering the good performance of the pose recognition method of the rigid human body model [8], this paper proposes to embed the temporal and spatial characteristics of human body parts in the flexible hybrid articulated human body model (FMP), so as to improve the robustness of human body and motion recognition and determine the key frames of an aerobics athletes' action video by using the human posture parameters and motion characteristics. In the FMP model, different affine deformations (such as rotation or bending) of each part of the human body are called the mixed type of the part, which is referred to as the mixed type [9]. Because the same human body part corresponds to several mixing types, the human posture parameters in an image I are determined by the position information of each part and their mixed types. In general, a K-relation graph G � (V, E) is used to describe a pose in I. e vertex set V represents the human body parts (such as the head, upper limbs, and trunk), and the edge set E⊆V × V represents the consistent constraint relationship between different human positions. According to the definition of FMP, the problem of parameter estimation of the human posture p in an image I can be formalized as the problem of cost minimization, which is shown as follows: where φ u (I, p u ) is an appearance model, which represents the cost of recognizing the human body part u at the position p u of the image I; ϕ u,v (p u − p v ) is a deformation model (usually assumed to be a spring energy model), which represents the deformation cost between two human parts u and v.
When using the FMP model to estimate the body posture in an Aerobics Athletes' action video, in order to maintain the continuity of human posture parameters in time and space, a temporal feature edge is added between p u t and p u t+1 in adjacent frames I t and I t+1 to form a flexible hybrid articulated human model (ST-FMP) with spatiotemporal characteristics [4]. In ST-FMP, the continuous error of the human posture defined by a time-series feature edge is calculated by an optical flow difference between p u t and p u t+1 : where f(p u t ) is the optical flow value from I t to I t+1 estimated at put. Suppose that the frame image set of an aerobics athlete's action video is I � I 1 , I 2 , . . . , I T . e estimated attitude parameter sequence is P � p 1 , p 2 , . . . , p T . en, using the ST-FMP model, we can obtain the cost of I as follows:  Scientific Programming 3 where C(I T , p T ) represents the cost of the estimating human posture p T from the image I T obtained according to formula (1), t is the frame number of an aerobics athletes' action video, λ 1 is the normalized constant, and θ(·) is the spatiotemporal continuity error expressed in equation (2).

ST-FMP Solution Based on Uncertain Position
Optimization. FMP is a rigid human body model, which can be represented by a Markov random field (MRF). e parameters of human body parts can be determined by the machine learning method. When FMP is used to estimate human posture parameters in a single frame image, MRF is regarded as a tree or star graph structure and is solved by confidence propagation BP. With the introduction of time constraint, a large number of loops will be generated in ST-FMP, which needs to be solved by minimizing equation (3) with an approximate algorithm such as cyclic confidence propagation (LBP) [10]. However, the LBP algorithm is a graph maximum clique problem, which has exponential complexity and low-time efficiency in a long video pose estimation. erefore, this paper designs a twostage ST-FMP algorithm based on uncertain human body parts.

Generating the Candidate Human Pose Sets.
Firstly, the N-best algorithm and formula (1) are used to generate K b human pose sets from a single frame image in O(K 2 b T) time without considering the space-time continuity constraint of human posture [11]. Due to motion blur and self-occlusion, some human body parts in K b , such as the elbow (LE, RE), wrist (LW, RW), knee (LK, RK), and ankle (LA, RA), identified by eight white points in Figure 2(a), are difficult to estimate accurately. erefore, in this paper, the uncertain part of I t+1 and p t+1 is introduced into the attitude estimation of I T . With the help of local time continuity of human body parts (represented by four dotted lines in Figure 2(a)), the accuracy of the human posture estimation is improved as follows: where W ⊂ V represents the set of uncertain parts and λ 1 is the normalized constant; except that C(·) only calculates the appearance cost and deformation cost of uncertain parts in W, the meaning of C(·) in formula (1) is the same.

Determining the Optimal Solution.
After obtaining the K b attitude parameters of each frame image, the dynamic programming algorithm is used to determine the optimal attitude parameters of each frame in O(K 2 b T) time by minimizing formula (3) as follows: Since the two steps of the ST-FMP method can be completed in O(K 2 b T), the two-stage ST-FMP algorithm based on uncertain human body parts significantly reduces the time complexity of the LBP-based ST-FMP algorithm.

Description of Motion Characteristics of Human Body
Parts. At present, the human motion model used for motion capture and motion recognition cannot be used directly because of the lack of accurate motion parameters such as joint angular velocity and displacement velocity. erefore, this paper designs a human motion feature description model based on relative position characteristics and motion direction of human body parts [12]. If the upper left corner of a single frame image is taken as the coordinate origin, the width and height directions are x and y axes, and pixels are taken as the unit to establish the coordinate system, it is assumed that C u � (C u x , C u y ) is the center position coordinate of the human body part p u , and (C x , C y ) is the position coordinate of human center of gravity. In this paper, the position information l u � (x u , y u ) of a human body part is defined as the relative position between (C x , C y ) and (C u x , C u y ), which is shown as follows: At the same time, the motion direction V u t of p u at time t is defined as the composite vector of the motion directions of all moving points in p u shown as follows: where v u,i t are the movement directions of the i th pixel in p u at time t and ξ u t is the collection of all moving points in p u at time t. In this paper, we estimate v u,i t by comparing the dense optical flow of adjacent frames I t+1 and I t+1 . With the help of formulas (6) and (7), the feature matrix composed of motion direction and relative position information of human body parts can be used to represent the human motion features in a frame image. Suppose that j u � (x u , y u , V u t ) ∈ R 3 is the motion characteristic of p u at time t, and x is the motion feature of the image I i containing d(d � 26) human parts at t time, then the motion characteristics of an aerobics athlete action video with the frame number T can be expressed as According to the above definition, J i is the 78-dimensional motion vector and f is the 78 × T vector matrix. Experiments show that the time complexity of the key frame extraction in 78 × T high-dimensional motion feature space is high, and a large amount of data redundancy and noise information will directly affect the accuracy of the key frame extraction. In order to improve the expression ability of local features of motion vectors, Laplacian scorching (LS) [13] is used to reduce the dimension of motion vectors to determine more discriminative human motion features. Firstly, a k-nearest neighbor graph G k is constructed; then, the similarity of two connected nodes in G k is calculated by using the thermal kernel function to obtain the Laplacian score L r of the r th motion feature; finally, the first n(1 ≤ n ≤ 3 d) motion features with smaller L r are determined as the body posture feature vector of the aerobics athlete's action video.

Aerobics Evaluation Based on Intelligent Computer-Aided Modeling
Usually, the differences between two aerobics athletes in an action video about human body movements are mainly manifested in the differences of the body posture, shape, and movement speed. erefore, this paper uses HOF and HOG to extract the human motion features in the aerobics athletes' action video and completes the coding of human motion features in the video by fisher vector technology.

Human Posture Shape Feature Extraction.
e posture feature of the body movement in an aerobics video is a static local topological structure of human body parts. e existing research results show that, even if the position of the gradient direction cannot be accurately obtained, the local dense gradient feature (HOG feature) can accurately represent the static features such as the human shape and position [14]. erefore, the posture characteristics of aerobics athletes in an action video can be described by HOG characteristics. Figure 3 shows the HOG characteristics of an aerobics athlete after visualization.
Firstly, the frame image of 240 × 320 pixels is divided into different cells according to the size of 40 × 40 pixels. en, the [1, 0, 1] template is used to convolute the horizontal and vertical directions of the image, and the gradient size and direction of each pixel are calculated as follows: where I x and I y represent the horizontal and vertical gradient values, respectively; M(x, y) represents the gradient size; and θ(x, y) represents the gradient direction. en, the gradient direction from 0°to 180°is divided into 16 bins. At the same time, every four cells are combined with a block of 80 × 80 pixels. In the same block, the Euclidean norm of histogram is used to normalize each cell. Since each unit contains 16-dimensional feature vectors, each block has only one 4 × 16 � 64-dimensional feature vector. e local features of the human posture and shape in each frame image are represented by hog feature vectors with a dimension of 6 × 8 × 16 � 768.

Human Motion Feature Extraction.
According to the principle of the optical flow field, the gray change of a human body image in adjacent video frames can better reflect the motion characteristics of the human body [15]. erefore, in this paper, the Lucas Kanade algorithm (LK algorithm for short) [16] is used to estimate the optical flow vector of each point in the image, and the motion information of the human body is determined by the optical flow histogram (HOF) feature [17]. When the LK algorithm is used, it is assumed that the optical flow remains unchanged in a small local area of the adjacent frame images. en, the velocity characteristics of the human motion from time t to time t + Δt can be determined by the least-squares method. According to the LK algorithm, after obtaining the optical flow of each pixel in the frame image, this paper first divides a 240 × 320-pixel frame image into 6 × 8 � 48 cells and then discretizes the optical flow vector in each cell into 16 bin directions and normalizes them to obtain the 6 × 8 × 16 � 768-dimensional HOF feature vector.

Automatic Scoring of Aerobics Athletes' Action Video.
In order to complete the automatic scoring of aerobics athletes' action videos, the aerobics athletes' action videos in the training dataset are divided into three categories according to the manual scoring results and the action completion quality. e STLPFV algorithm is used to obtain the action features of each segment of aerobics athletes' action videos, and then the k-neighbor algorithm is used to Scientific Programming complete the motion feature modeling. When the aerobics player's action video is scored automatically, the FV code of the target aerobics athlete's action video is generated by the STLPFV algorithm [18]. Finally, the KNN distance between the action video of target aerobicsathletes and the action video of aerobics athletes with three different scores is classified, and the least-squares method is used for automatic scoring [19]. C � c l , c m , c h is the classification center set of three kinds of actions obtained by the k-nearest neighbor algorithm. If a triple KNN( i ] is used to represent the KNN distance vector between the target aerobics athletes' action video s i and C (where p (l) i , p (m) i , p (h) i , respectively, represent the probability values of s i belonging to three different score action videos), then the weight of s i on C is as follows: Weight w i obtained by formula (9) needs to be calibrated as the corresponding action score. In this paper, the leastsquare method and action video training set are used to establish the linear relationship model between weight w i and action score. e specific method is to assume that Y (α) � y (α) i |i � 1, 2, . . . , n is the score set of all kinds of actions in C (where n represents the number of action videos and α ∈ l, m, h { } represents three types of action videos with different scores, i.e., low score, medium score, and high score), which is shown as follows: where ρ (α) � ρ (α) 0 , ρ (α) 1 represents the coefficient value vector obtained by solving formula (10) of an action video training set. At this time, the target aerobics athletes' action video s i corresponding score is defined as follows: Accordingly, the flow of the automatic scoring algorithm of an aerobics athletes' action video based on the STLPFV method is shown in Figure 4. It includes the following parts: (1) e STLPFV method is used to calculate the human motion characteristics in the motion video s i of aerobics athletes, and the FV code is obtained. (2) e KNN distance between s i and C is calculated by the k-neighbor algorithm, and the probability KNN(s i ) that s i belongs to all kinds of action scores in C is obtained. Weight w i is calculated by formula (11).

Experiment and Analysis
is paper takes the key frame of aerobics action video as an example to carry out the simulation experiment. e experimental results are compared with the results of the manual extraction and the extraction results of the latest aerobics athletes' action video key frame algorithm [20].

Training and Evaluation Criteria of Experimental Data
Samples and Characteristics. At first, three students were invited to do 120 s public aerobics twice, and a video with 640 × 480 resolution was recorded by an ordinary webcam at the sampling frequency of 20 frames/S. en, 300 images were selected from each of the six sets of videos as experimental data from the 10th rhythm. Finally, in 1800 frames of images, according to Figure 2(a), 13 joint positions of each aerobics movement are manually marked. In the simulation experiment, 900 images of the first pass and the second pass are selected as the training sample dataset and the test sample dataset.
In the process of training, in order to enrich the human motion characteristics of the positive samples, the rotation (according to the four angles of −15°, −7.5°, 7.5°, and 15°) and All the key frames in the test sample set are extracted manually, and the common key frame extraction accuracy is taken as the evaluation standard of algorithm performance [21], which is shown as follows: where n and m represent the number of key frames extracted manually; f i and r i represent the key frames extracted by algorithm, respectively; and δ(·) is the similarity function between f i and r i . When f i and r i are the same, δ(·) value is 1; otherwise, it is 0.

Comparison of the Effectiveness of Spatiotemporal Feature
Embedding in Uncertain Parts. In order to detect the accuracy of the manikin embedded with a uncertain part of the temporal and spatial features in the video, three different ST-FMP models are used to realize the human body parts, and the comparative experiments of elbow and knee parts are carried out according to different error pixel thresholds. e experimental results are shown in Figure 5. As can be seen from Figure 5, compared with the FMP model, the accuracy of the uncertain part can be significantly improved within a certain pixel error range by using the ST-FMP algorithm to estimate the human posture in the aerobics athlete's action video. Taking the error threshold of 20 pixels as an example, the accuracy of the elbow and knee obtained by the ST-FMP algorithm is about 15% and 19% higher than that of the FMP model, respectively. However, when the pixel error threshold is large (e.g., greater than 40 pixels) or small (such as greater than 10 pixels), the accuracy difference is not significant.
According to Figure 5, only when the time continuity of the uncertain part of the upper limb (lower limb) is maintained, the recognition accuracy of the elbow and wrist (knee and ankle) will be higher than that of the FMP model, but lower than that of the ST-FMP algorithm. e experimental results show that the ST-FMP algorithm can significantly improve the recognition performance of nondeterministic parts in aerobics athletes' action videos by optimizing the recognition results of human body parts by local time continuity constraints.
is paper also compares the performance of the FMP model, ST-FMP model, and their different implementations in the key frame extraction of the aerobics athletes' action video. e experimental results are shown in Figure 6.
According to the results shown in Figure 6, two conclusions are obtained as follows: (1) When the accuracy error is less than 30 pixels, the accuracy of the ST-FMP algorithm is higher and more stable, which is about 11% higher than that of the FMP algorithm. In addition, when the time continuity constraint of the upper limb (lower limb) is added, the extraction accuracy of FMP model key frames will be improved by about 3%. (2) When the error accuracy is greater than 35 pixels, the performance of the ST-FMP algorithm is still better than that of the FMP model, but the accuracy rate is reduced by about 15%. At the same time, when the accuracy error is 30 pixels and the key frames are extracted by the ST-FMP algorithm with the different number of motion features, the accuracy curve of the algorithm does not fluctuate violently, and the performance is stable in the range of 15-60 motion feature numbers, as shown in Figure 7. e experimental results show that the ST-FMP algorithm is sensitive to human posture estimation results and the local topological structure of the human body, and spatiotemporal constraints of uncertain parts play an important role in key frame extraction performance.

Performance Comparison of Key Frame Algorithms.
In order to compare the performance of the key frame algorithm, the simulation experimental results of the ST-FMP algorithm are compared with the operation results of the KFE algorithm based on priori in reference [22] and the  Scientific Programming motion block-based key frame extraction algorithm (referred to as the motion block algorithm) in reference [23], as shown in Table 1. e experimental results in Table 1 show that the accuracy and recall rate of the ST-FMP algorithm are better than the other two algorithms. First of all, as can be seen from Table 1, the accuracy rate of the ST-FMP algorithm is about 18% and 26% higher than KFE algorithm and motion block algorithm, respectively. e KFE algorithm uses predefined motion directions of 16 blocks to represent human motion features, while the ST-FMP algorithm uses the first 15 LS human motion pose eigenvalues of each action video to represent the human motion. erefore, the ST-FMP algorithm uses less redundant motion feature vectors, less noise, and accurate expression of local motion of human body parts, which is conducive to improving the accuracy of key frame and action recognition.
Secondly, it can be seen from Table 1 that the recall rate of the ST-FMP algorithm is significantly better than the other two algorithms, with an average of 23 and 13 percentage points higher, respectively. e KFE algorithm and motion block algorithm belong to key frame technology based on the difference of image bottom features. ey select key frames by comparing the motion changes in different regions of the image. e ST-FMP algorithm describes the local motion characteristics of human body parts, which is a semantic model in essence. It can analyze and understand the human actions in the aerobics athletes' action video from    Scientific Programming the higher level aspects such as the participation the parts of the human body movement and its movement change trend. By using semantic rules such as the human posture similarity to select key frames, it can obtain more accurate and consistent with people's cognitive process key frame results. e experimental results show that the ST-FMP algorithm can better express the local topology of the human body and support the key frame selection based on semantic rules. It is not only closer to the manual extraction results but also more suitable for the key frames based on feature extraction. At the same time, because the ST-FMP divides human body parts into different flexible parts, identifies the human posture through the local topology of flexible parts, and reduces continuous error of the human posture estimation by using temporal feature edge constraint, so it has strong robustness in complex scenes.

Conclusion
Aiming at the problems of lack of auxiliary teaching conditions and low-training efficiency in traditional aerobics training methods, this paper proposes a local space-time preserving Fisher vector (FV) coding method and monocular motion video automatic scoring technology. e experimental results show that the ST-FMP model significantly improves the recognition accuracy and attitude estimation performance by maintaining the spatiotemporal continuity of uncertain parts of the human body and embedding the spatiotemporal characteristics of local actions, and the key frame set obtained is more in line with people's cognitive process. e algorithm proposed in this paper still has some problems to be improved and optimized. For example, the intelligent computer-aided training system lacks the function of real-time tracking and monitoring in the process of aerobics training. erefore, how to monitor and evaluate the level of aerobics training in real time according to the human movement characteristics and human posture sensitivity in aerobics training is the next problem to be solved [24].
Data Availability e dataset can be accessed upon request to the corresponding author.

Conflicts of Interest
e author declares no conflicts of interest.