A Novel Automatic Tracking Method of Moving Image Sequence Marker Points Uses Kinect and Wireless Network Technology

Sports biomechanics research includes the use of various technical methods to collect various sports biomechanical parameters and variables. Sports biomechanics is one of the branches of sports science, which is a new science for studying sports. In the process of teaching, the Kinect somatosensory camera is used for bone tracking to obtain spatial coordinate information of human nodes, and the teaching data for imitation learning is obtained after preprocessing. This paper builds a general sports auxiliary training system and designs some of the system ’ s functions in detail, using the advantages of Microsoft ’ s Kinect collector in image acquisition. The automatic tracking method of sports image sequence marker points is studied in this paper using Kinect. Kinect is used to collect bone point information, establish a virtual coordinate system, obtain each joint vector from the node information, calculate the included angle between the vectors, and then obtain the control angle of each joint, providing data support for imitation representation learning. Count and score this movement based on the distance between the jaw and the crossbar as well as the angle at which the arm bends.


Introduction
In recent years, detecting, identifying, and tracking interested human parts from sports image sequences are one of the hot topics in the field of image processing and computer vision, and it has a very broad application prospect. Sports science is a new science to study sports, and sports biomechanics is one of its branches. With the rapid development of mass sports, information technology, represented by computer technology, has also made rapid progress, with the rapid popularization of personal computers and the rapid improvement of computing speed. People are amazed at its intelligence and low price, and at the same time, they have made new explorations in sports assistance [1,2]. Using various technical means to collect various sports biomechanical parameters and variables is an important content of sports biomechanics research. The current sports education system has a long way to go in comparison to the ideal requirements. Currently, the majority of existing systems use human motion measurement and analysis methods based on high-precision video capture and analysis. Humans' primary means of obtaining information from nature is through vision. According to statistics, humans acquire about 60% of their information visually, and images are the primary means of acquiring visual information [3]. The so-called "graph" [4] refers to the distribution of light transmitted or reflected by an object. "Image" is the impression or recognition that the human visual system receives the information of the picture in the brain. Target tracking is a key technology in the field of computer vision, such as intelligent monitoring, traffic statistics, and video analysis of sports information. By accurately tracking the motion trajectory of the target, a high-level semantic analysis and understanding of a sports video can be carried out [5,6]. In video tracking, firstly, the image sequence of moving objects should be acquired, and the analysis of continuous video moving objects in reality should be transformed into the analysis of target points in discrete image sequences. The analysis process includes the following main steps: (1) extracting moving objects from the image background, (2) mark and locate the position of the target point in the moving image sequence, and (3) track these target points. Among them, the identification, positioning, and tracking of moving targets are a key step and the basis of image analysis and understanding.
As a low-cost and reliable unmarked deep motion capture technology, Microsoft Kinect sensor has attracted the attention of motion research circles. Combined with the advantages of Kinect collector of Microsoft in image acquisition, this paper constructs a general sports auxiliary training system and designs some functions of the system in detail [7,8]. So far, as a general depth camera, Kinect has been applied in the field of sports training to realize human-computer interaction. Scholars at home and abroad have also done a lot of research in these aspects, mainly including recognition [9], bone tracking, and depth image. The Kinect somatosensory camera is used for bone tracking to obtain the spatial coordinate information of human nodes in the process of teaching, and the teaching data of imitation learning is obtained after preprocessing. In order to overcome camera jitter and background jitter, symmetrical difference and neighborhood filtering are used to locate moving targets. Finally, the moving target region is extracted by projection algorithm, and the clutter block removal and region merging are adopted to make the extracted moving target region more accurate and continuous [10].
The joints captured by Kinect are used to recognize human posture, so as to remotely monitor patients' yoga posture. This paper discusses the application of somatosensory interaction in national physical fitness test and selftraining with Kinect sensor [11,12]. Using the depth image collected by Kinect, the height of the horizontal bar is determined according to the field of view and the depth value. Using bone tracking technology to determine the position of the jaw, the bending angle of the arm is calculated according to the three joint points of the arm. Count and score this movement according to the distance between jaw and crossbar and the angle of arm bending. Combine Kinect cheap and powerful somatosensory equipment with mass sports to create a brand-new sports teaching method and promote the rapid development of mass sports [13,14]. We firmly believe that through our persistent efforts, we will definitely realize a high-quality sports teaching system that meets the ideal conditions. Using Kinect to collect the information of bone points, establish a virtual coordinate system, get the joint vectors from the node information, calculate the included angle between the vectors, and then get the control angle of each joint, so as to obtain the motion information of each joint and provide data support for the representation learning [15] of imitation.

Related Work
The sports image sequence marker points are applied to the sports video analysis system, according to literature [16]. The first project involves using the moving target tracking method to automatically place joint points in human motion sequence images. Automatically determining the position of joint points during human movement by computer will greatly reduce the time spent by operators in video analysis, as well as avoid errors caused by operators' analysis fatigue and a variety of other factors. According to the literature [17], if more detailed information about the shape or appearance of the sports image is available before tracking, an a priori model for the target can be created, and then, the target can be actively searched in the sequence of sports images to achieve the tracking goal. These models can be used to predict the target's possible performance in the sequence, reduce the feature space to be searched, and improve the algorithm's robustness in the target tracking process. Because the human body has uniform chromaticity and no obvious feature points and various joint parts of the human body deform to varying degrees during movement, the general sports image processing method applied to the recognition of human joint points has some difficulty in target recognition and position positioning, according to the research in reference [18]. The problem of human joint location has yet to be solved perfectly. Literature [19] proposed that active contour method is a unique algorithm, which is applied in many places of image processing, such as using snake for target detection and segmentation. Firstly, the initial contour line is marked manually near the boundary of the target, and the internal force, external force, and binding force acting on the contour line are defined. By minimizing the curve energy, the curve can actively move to the real contour of the target of interest. Reference [20] proposed a target detection algorithm scheme of automatic clustering detection and outlier detection. Sui Hua and others proposed a double difference target detection algorithm based on the detection and classification of moving targets in traffic monitoring system and the principle of maximum estimation; VSAM, a major visual surveillance project led by Carnegie Mellon University, has developed a hybrid algorithm combining adaptive background subtraction method and interframe subtraction method. Literature [21] shows that feature-based tracking algorithms generally adopt correlation algorithms. Unlike region-based tracking algorithms, region-based tracking uses the whole marker points of sports image sequences as the relevant objects, while feature-based tracking uses one or some local features of the target as the relevant objects. The balloon model of active contour is presented in the literature [22]. These models increase the initial contour's capture area so that it does not have to be close to the object of interest's real boundary, and they reduce the sensitivity to contour initialization [23]. The landmark points of sports image sequence are applied to the rapid feedback system of sports, which is primarily manifested in video image performance, using the method of big data analysis. In sports image processing technology, for example, segmentation techniques for moving objects include superposition comparison of different images and videos and superposition display of the same video. At present, all the fast feedback systems have the above functions. It is very helpful for athletes to understand and improve technical movements by visually displaying the technical differences of different athletes or the technical movements of the same athlete at different times through sports images. Literature [24] proposed that MSEA algorithm can quickly judge whether the candidate position needs matching operation by comparing the gray information of template with 2 Wireless Communications and Mobile Computing that of local image. Literature [25] proposes that the combination of segmentation based on optical flow and depth information of video can realize the tracking of rigid and nonrigid targets. Lu et al. combined edge, optical flow, and shadow information to optimize the model in tracking and realized the tracking of human hand under changing conditions. This paper studies the automatic tracking method of sports image sequence marker points based on Kinect. In the early stage of sports image research, the process of target point positioning and tracking is manual. Due to the large number of images to be processed, manual positioning and tracking take a lot of time, which limits the analysis of a large number of images. The process is very complex, and the positioning accuracy is low. It is not suitable for the analysis of moving video images. Therefore, the automatic fast positioning method can be adopted in the processing process, which will greatly speed up the processing speed.

Principle and Algorithm of Kinect
The template matching method is mainly used to realize fast positioning in the moving image sequence. Its main idea is that in the analysis process, the target point position of the first frame image is manually input as a priori knowledge, and the subsequent image processing uses the priori knowledge of the target position in the first frame image to accurately register the moving target points in the front and back frames. The accurate position of the moving target in the post frame image is obtained, and then, the intelligent computer automatic positioning of the moving target point in the image is realized to improve the processing speed and accuracy. The optical flow based tracking algorithm assumes that the gray values of the corresponding pixels of the target in the front and back frames are the same. Through the calculation of the optical flow field, the motion field of the pixels in the image can be obtained, which can be used to guide the next tracking. The feature point optical flow method obtains the optical flow vector at the feature point through feature matching. Compared with the global optical flow method, this algorithm has the characteristics of small amount of calculation, fast, and flexible. Optical flow method can track moving targets well in dynamic background, but due to the aperture and occlusion problems, the solution of estimating two-dimensional motion field by optical flow method is unstable, and additional hypothetical models need to be used to simulate two-dimensional motion field. The color image collected by color camera and the depth image collected by infrared camera are used. The Kinect algorithm is applied to motion comparison analysis. The steps of the proposed method are shown in Figure 1.
Each feature point direction can specify one or more direction parameters based on the gradient direction characteristics of the local image of feature points. Sample in the feature point neighborhood window first, then compute the gradient direction histogram of each feature point neighborhood pixel, and finally compute the contribution value of each neighborhood pixel. The main direction of feature points is determined by the gradient histogram's peak value; the secondary direction is determined by the 80 percent peak value. First, the affine invariant vector construction method and the occlusion criterion based on it are presented. Then, using prior knowledge of the tracked target, a trajectory and occlusion area prediction method is discussed in detail. On this basis, the affine invariant vector-dependent occlusion criterion for moving targets is modified. It can also be divided into single target tracking and multitarget tracking depending on the number of tracking targets. The research object of sports image sequence marker processing is transferred from still image to sports image sequence marker using the Kinect algorithm. It is possible to know and analyze the dynamic process by analyzing multiple sports images, in order to obtain information that cannot be obtained by a single sports image sequence marker. Using the depth image collected by Kinect, the height of the horizontal bar is determined according to the field of view and depth value. For example, as far as tracking objects are concerned, there are tracking body parts such as the hands, faces, heads, and legs and tracking the whole human body. Automatic tracking method is to monitor the spatio-temporal changes of objects in the landmark points of sports image sequence, which includes the existence, position, size and shape of objects, and so on. The system uses Microsoft's Kinect sensor to extract the information in the image for data analysis, count, and score the pull-up movement, so as to realize the functions of automatic test and selftraining. The system mainly includes three modules: cross bar position recognition module, user bone tracking module, and counting and scoring module. At the same time, users can evaluate their actions according to the action playback and scoring of video, so as to realize the standardization of actions. The system block diagram is shown in Figure 2.
Kinect obtains the depth map by emitting near-infrared light source. Kinect will actively track large-shaped objects whether there is enough light or not. Then, the target detection algorithm will search the whole field of view for the marker points of the sports image sequence to remove the marker points of the background sports image sequence and extract multiple moving object regions, that is, generate a differential binary image with the foreground as the moving object region. The three-dimensional coordinate information of bone points is obtained by Kinect, and joint vectors are generated. The joint control angles are obtained by calculating the included angles of the joint vectors. For the joint angles of RElbowRoll and RElbowYaw, the bone points that Kinect needs to extract are Wrist_R, Elbow_R, Shoul-der-R, and Spine-Shoulder, and their threedimensional coordinate information is ðx 1 , y 1 , z 1 Þ and ðx 2 , y 2 , z 2 Þ, respectively.

Match failed
The distance between the nearest two frames and the frame to be matched is less than a certain threshold Key partition image   Wireless Communications and Mobile Computing be ξ j = fξ s,j , ξ t,j g and j = f1, 2, ⋯, Ng, where n is the number of data points contained in a single teaching, ξ s,j is the joint angle, and ξ t,j is the time value. It is assumed that each data point EE follows the following probability distribution where P ðkÞ is a priori probability and pðξ j jkÞ is a conditional probability distribution and obeys Gaussian distribution. Therefore, the whole teaching data set can be represented by Gaussian mixture model, and K is the number of Gaussian distributions constituting Gaussian mixture model where d is the dimension of GMM encoding teaching data. Therefore, the parameter to be determined in Gaussian mixture model is fπ k , μ k , ∑ k g, which, respectively, represents the prior probability, expectation, and variance of the Kth component. The EM algorithm is used to estimate the parameters of GMM, and the maximum likelihood estimation of parameters is found in the probability model to learn the parameters.
After the feature matching of the action sequence by Kinect algorithm, the key point images of two groups of action sequences are obtained. According to the characteristics of the action, we use two ways to segment the action, which are as follows: (1) Equal Time Interval Action Segmentation Method.
Segment according to the time sequence of two images matched by the characteristics of two groups of action sequences, that is, each action sequence matching image is divided into five parts on average. Specifically, the first image is evenly distributed according to the horizontal coordinate interval and divided into five parts. The division of the second image is the same as above (2) Action Segmentation Method with Equal Key Point Proportion. Segment according to the principle that the proportion of the corresponding segment key points of the two images matched by the characteristics of the two groups of action sequences is the same. The first image is evenly distributed and divided into five parts using a specific method based on the horizontal coordinate interval. Calculate the proportion of the number of key points in each segment in the first image to the total key points found, multiply this proportion by the total key points found in the second image to get the boundary coordinates of each segment in the second image, and then divide the segmentation of the second image by the corresponding key point interval

Kinect-Based Automatic Tracking Method of Sports
Image Sequence Markers. Automatic tracking method of sports image sequence landmarks based on Kinect is an important research topic of machine vision, which is widely used in intelligent monitoring, video coding, and artificial intelligence. Among the environmental information perceived by people, visual information accounts for a very large proportion, among which dynamic visual information is the main component, and a large number of meaningful visual information are included in sports. Based on Kinect algorithm, the research object of sports image sequence marker processing is transferred from still image to sports image sequence marker. By analyzing multiple sports images, it is possible to know and analyze the dynamic process, so as to obtain information that can not be obtained by a single sports image sequence marker. In order to solve the problem that objects and objects, objects and scenes, and scenes and scenes are often occluded in image sequences under complex background, this paper presents an occlusion detection and tracking method for multiple moving objects based on affine invariants. Firstly, the construction method of affine invariant vector and the occlusion criterion based on it are given. Then, combined with the prior knowledge of the tracked target, a prediction method of trajectory and occlusion area is discussed in detail. On this basis, the occlusion criterion of moving target which depends on affine invariant vector is modified. As far as tracking angles are concerned, there are single angles corresponding to a single camera, multiple angles corresponding to multiple cameras, and omni-directional angles. Kinect algorithm, as a concise nonparametric density estimation method, has been widely used in the field of realtime target tracking in recent years because of its high efficiency in feature space search. The rapid advancement of the Kinect algorithm in hardware technology has resulted in significant improvements in the storage and processing speed of sports image sequence marker points, as well as strong support for sports image sequence marker point analysis. To begin, the input sports image sequence marker point data is preprocessed in order to improve the quality of the markers. The target detection algorithm will then search the entire field of view for marker points from the sports image sequence in order to remove marker points from the background sports image sequence and extract multiple moving object regions, resulting in a differential binary image with the foreground as the moving object region. The author proposes a new method for extracting moving target regions from image sequences. To begin, the image sequence is differentiated; the domain value for binarizing the difference image is automatically selected based on an analysis of the histogram of the difference result image. To locate moving targets, symmetrical difference and neighborhood filtering are used to overcome camera jitter and background jitter. Finally, using a projection algorithm, the moving target region is extracted, and clutter block removal and region merging are used to make the extracted 5 Wireless Communications and Mobile Computing moving target region more accurate and continuous. The target continuous tracking stage is primarily investigated using the prediction matching correction algorithm framework. To begin, the trajectory prediction algorithm is used to forecast the target's possible position in the next frame. The target template obtained during the detection and positioning stage is then used for location matching within the prediction-based search range. The optimal result obtained by matching is used to determine the target's position in this frame. The corresponding parameters must be adjusted in order to avoid affecting the effect of target optimal position matching. Furthermore, when using a position prediction algorithm based on trajectory fitting, if the target is mobile throughout the movement process, the linear prediction algorithm is preferable; otherwise, the square prediction algorithm can be used. As a result, the specific algorithm scheme for target trajectory prediction must be chosen in conjunction with the above optimal selection principles, based on the actual application scenario and target motion characteristics.
There are numerous methods for tracking research and classification available today. In terms of tracking objects, there are single angles corresponding to a single camera, multiple angles corresponding to multiple cameras, and omni-directional angles; in terms of tracking angles, there are single angles corresponding to a single camera, multiple angles corresponding to multiple cameras, and omnidirectional angles. Because of its high efficiency in feature space search, the Kinect algorithm has become widely used in the field of real-time target tracking in recent years as a concise nonparametric density estimation method. The purpose of the automatic tracking method is to track the spatiotemporal changes of objects in the landmark points of a sports image sequence, such as their existence, position, size, and shape. However, because the background in a real-world test might be quite complicated, it is necessary to threshold the depth image, separate the human body from the background, and then analyze the human motion. To begin, consider using face recognition technology to determine the position of the mandible. However, when pulling up, the occlusion of the crossbar will prevent face recognition, and thus, the positional relationship between the mandible and the crossbar will be inaccurately judged when the human body reaches its highest point. The Kinect depth image can be used to calculate the distance between the user and the Kinect camera. However, because the background in a real-world test might be quite complicated, it is necessary to threshold the depth image, separate the human body from the background, and then analyze the human motion. The target tracking becomes searching for a corresponding target in the current frame, which minimizes the distance function between the target description centered on Y and the original target, using the template description of the target and candidate target of the sports image sequence and the criterion for measuring their similarity. The project uses bones to track the user's joint points and compares the position of the user's mandibular point with the position of the determined cross bar to judge whether the user's mandibular crosses the cross bar due to the above problems with face tracking and depth image. It is counted once when the mandibular crosses the cross bar twice. The author proposes a new method for extracting moving target regions from image sequences. To begin, the image sequence is differentiated; the domain value for binarizing the difference image is automatically selected based on an analysis of the histogram of the difference result image. The starting point for target search is the position of the target in the previous frame, and it searches the area around it.

Experimental
Results and Analysis. In this paper, the Kinect algorithm is used to simulate and compare the predictions of the two prediction algorithms under different landmarks of sports image sequences, and the applicable scenes and optimal selection principles of the algorithms are summarized according to the experimental results. Target features are extracted from the image, such as lines, curves, areas, and surfaces of the target. The advantage of perspective-based modeling lies in that, due to its mathematical generality, it is possible to deal with all kinds of variability mentioned above within a consistent framework, and its computational complexity is much smaller than that of 3D expression. This method of expressing targets can be used to track objects with linked structures, such as human hands or torso. The simulation experiment mainly uses 3-point linear predictor and 5-point square predictor to test the prediction algorithm based on trajectory fitting. In the experiment of Kalman filter prediction algorithm, the state vector estimation x (0), the initial value of error covariance matrix Pð0Þ, the transition matrix A, and the observation matrix H can be determined in advance as described in Section 4.1.3 of this chapter. In the experiment, three simulation experiments were conducted to compare the three trajectories. The experimental results are shown in Figures 3-8, respectively.
From the above simulation results, it can be seen that the Kalman prediction algorithm has the best effect in the prediction in Figures 3-5. The predicted trajectory is smooth, and the relative error is small, which fully reflects its robustness. In the case of large signal-to-noise ratio, the absolute value of prediction error of linear prediction algorithm is less than 1, and its actual prediction effect can be compared with Kalman prediction algorithm. The effect of the square predictor is poor, especially when the signal-to-noise ratio is low, and its error is obvious. Experiments show that the square predictor fails to predict straight lines and target trajectories with significant random motion. The overall effect of the Kalman prediction algorithm is still good in Figures 6-8, but it is unstable in the initial prediction stage and returns to normal after the observed values are corrected. As a result, the accuracy of the Kalman prediction algorithm's first few prediction points is low. The square predictor's prediction error is currently lower than the linear predictor's. It has been proven that the square predictor is better than the linear predictor at predicting conic motion. A signal similarity-based global motion estimation method between image sequences is proposed. This method divides the image using a new quadtree-like structure, then gradually focuses on segmenting the image reference structure that 6 Wireless Communications and Mobile Computing     Wireless Communications and Mobile Computing meet the real-time requirements of the system, the Kalman filter prediction algorithm will have better prediction accuracy than the other two algorithms, especially in the case of high randomness of target motion or large scene noise interference. If the real-time requirement of the system is high, the prediction algorithm based on trajectory fitting with small amount of calculation, simple structure, and easy implementation is mainly considered.

Conclusions
It covers a wide range of topics, including computer science, optics, mathematics, cognitive science, and control science, among others. One of the most fundamental and difficult topics in computer vision is moving object detection and tracking in image sequences. The study of it is both practical and theoretically important. The feature extraction of sports image sequence landmark points constructed in this paper has certain feasibility of automatic tracking, can extract the key joint information of the moving human body, and can compare the differences between standard movements and sports movements with the help of an auxiliary system in the process of extracting human sports image sequence landmark points. Bone tracking is used to determine the user's jaw position. Count and score using the distance between the mandible and the crossbar, as well as the arm's bending angle, as per the national physical fitness test standard. First, the marking points of a sports image sequence are differentiated; then, based on an analysis of the difference result image's histogram, the Kinect algorithm automatically selects a threshold value to binarize the difference image. Finally, the Kinect algorithm uses a projection algorithm to extract moving target areas and then uses clutter removal and region merging to improve the accuracy and consistency of the extracted moving target areas. As a result, the specific algorithm scheme for target trajectory prediction must be chosen in conjunction with the above optimal selection principles, based on the actual application scenario and target motion characteristics.

Data Availability
The data used to support the findings of this study are included within the article.

Conflicts of Interest
All the authors do not have any possible conflicts of interest.