Real-Time Capture of Snowboarder’s Skiing Motion Using a 3D Vision Sensor

Due to the influence of environmental interference and too fast speed, there are some problems in ski motion capture, such as inaccurate motion capture, motion delay, and motion loss, resulting in the inconsistency between the actual motion of later athletes and the motion of virtual characters. To solve the above problems, a real-time skiing motion capture method of snowboarders based on a 3D vision sensor is proposed. This method combines the Time of Fight (TOF) camera and highspeed vision sensor to form a motion acquisition system. The collected motion images are fused to form a complete motion image, and the pose is solved. The pose data is bound with the constructed virtual character model to drive the virtual model to synchronously complete the snowboarding motion and realize the real-time capture of skiing motion. The results show that the motion accuracy of the system is as high as 98.6%, which improves the capture effect, and the motion matching proportion is better and more practical. It is also excellent in the investigation of motion delay and motion loss.


Introduction
Snowboarding is an intense sport, which is mainly carried out in winter, and it is one of the necessary events of the Winter Olympic Games [1]. The sport is exciting, cool, and dangerous, and the training requirements are rather high.
Physical quality consists of technology, psychology, tactics, and other factors. In recent years, the development of physical fitness training in China has made great progress than before. In theory, we no longer blindly follow the traditional training mode of "three importance and one greatness," and the training means and methods are constantly innovated. However, China's physical training development is still in the stage of catching up and learning, and there is still a big gap with foreign countries. In foreign countries, physical training pays more attention to the comprehensive and balanced develop-ment of physical elements and puts forward many powerful concepts of physical training. Functional physical training is a popular training concept in recent years, but due to the lack of theoretical knowledge of functional physical training in China, we still lack a comprehensive and systematic understanding. This study collects a large number of the latest relevant literature and foreign literature, summarizes, arranges, and applies them to cross-country skiing physical training, so as to make a certain contribution to improving the competitive ability of cross-country skiing teams. Motion capture, also known as motion tracking, collects athletes' motion data in real time and then uses these data to drive virtual character simulation to display athletes' motions [2]. In the skiing motion training of professional athletes, motion capture is very important. It can be used to observe athletes' moving posture in real time, which is of great practical significance to correct wrong motions and improve athletes' skill level. At present, the accuracy of the motion capture system used by Chinese athletes in skiing training is not high. Because the skiing motion is too fast and difficult, it is easy to miss capture. If there is no more professional system to capture the motion of skiers for a long time, the coach will not be able to correct their mistakes in the skiing process in time, which will seriously affect the training speed of skiers [3]. To solve the above problems, a more suitable method for skiing motion capture is proposed, that is, snowboarding athletes' skiing motion real-time capture based on a 3D vision sensor. 3D display technology is a new technology with rapid development in recent years. At present, it is widely used in animation production. Compared with the 2D presentation effect, 3D display technology can present the picture more realistically and make people feel immersive. 3D vision sensor is the main device to realize 3D technology. Based on the original system, 3D technology is applied to this study in order to provide effective help for ski training.
The main contributions of this paper are as follows: (1) Reduce missed capture and improve the accuracy of the system in capturing motion (2) The application of a 3D vision sensor in real-time capture of snowboarding athletes' skiing motion improves the performance of motion capture, which is convenient to capture feature points and better simulate motion (3) Different datasets are used to simulate the skiing motion real-time capture system in this paper, and the feasibility of the system is tested   Compared with the above systems, this paper uses a TOF camera and high-speed vision sensor as a motion acquisition system. This can greatly improve the accuracy of system motion simulation, and it is less affected by environmental interference. It can be applied to motion capture in various environments and has a good application prospect.

Real-Time Capture of Snowboarding Motion Using a 3D Vision Sensor
Snowboarding is dangerous, and motion learning is difficult. Ordinary people generally need several months to ensure that they can slide alone. However, for professional athletes with competition pressure, optical gliding is not enough. They also need continuous training to achieve excellent results in the formal competition. The training intensity of professional athletes is much higher than that of ordinary people, especially for the control of some details of motion. In order to assist the training of skiers, a real-time snowboarding capture method based on a 3D vision sensor is proposed in this study. This method mainly solves the following three problems: (1) Interference by ambient light (2) The motion speed is too fast to capture (3) The athlete's actual motion cannot be consistent with the virtual character model motion The first two problems are the main reasons for the last problem. Therefore, the latter problem can be solved by solving the first two problems and finally improved the accuracy of motion capture. This study is divided into three parts: snowboarding motion data acquisition, pose solution, and virtual character model establishment and driving. Specific analysis is made for these three parts.

Snowboarding Motion Image Acquisition.
In order to solve the influence of ambient light interference and motion speed on motion capture, this design uses two kinds of 3D vision sensors to form an acquisition system to improve the comprehensiveness and integrity of motion data acquisition [8][9][10]. One of the two sensors is a TOF camera with less interference from ambient light, and the other is a highspeed vision sensor, which can be used to capture fastmoving targets. Through the combination of the two sensors, the problem of data loss in motion capture is reduced.
3.1.1. TOF Depth Camera. TOF depth camera collects reflected light to build depth mapped images. The distance from the camera to the object can be calculated according to time and where D refers to the distance and Δt refers to the time for the pulse signal to travel from the camera and the target and then return [11].
3.1.2. High-Speed Vision Sensor. Compared with ordinary sensors, high-speed vision sensors adopt stacked structure and back-illuminated pixel array to improve their parallel processing ability [12,13]. According to relevant research, the speed of detecting and tracking targets by high-speed vision sensor can reach 1000 frames per second. In this paper, a high-speed vision sensor is used to collect the data of dynamic target in real time, so as to reduce the missed acquisition probability of motion.
After the snowboarding athletes' skiing motion images are collected by the two 3D vision sensors, image fusion is required to integrate the two images. The fusion process is as follows: Step 1. Multiscale transformation.
The motion images of two kinds of source snowboarding are decomposed by wavelet transform, and two kinds of coefficients are obtained, namely, high-frequency coefficients and low-frequency coefficients.
The Canny operator is used to analyze the low-frequency coefficients of motion images of two kinds of source snowboarding Y j,A (low-frequency coefficient of TOF depth image) and Y j,B (low-frequency coefficient of high-speed vision sensor image) for the extraction of edge features and then construct binary edge feature Z j,A and Z j,B [14]. The construction formula is as follows: where XOR refers to the XOR operation symbol, XORð Y j,A , Y j,B Þ means to calculate the low-frequency coefficients of TOF depth image and high-speed vision sensor image, and AND means the calculation symbol.
(2) Gradient feature    The gradient feature extraction formula [15] is as follows:

Wireless Communications and Mobile Computing
where Sði, jÞ refers to the average gradient of the pixel at the low-frequency coefficient of ði, jÞ, Δf x ði, jÞ and Δf y ði, jÞ are the difference of the image at the regional window's central pixel ði, jÞ at direction x and y, and X means the number of pixels within the regional window at the direction x and y.

(3) Extraction of signal intensity feature
For the high-frequency coefficients after image decomposition, the signal intensity features are extracted. The extraction formula is as follows: where R AB means the intensity ratio of correlation signal, G j,A means signal intensity in TOF depth image area window, and G j,B means signal intensity in image region window of a high-speed vision sensor.
According to the above features, the fusion is carried out, and the fusion formula is as follows: where H AB ði, jÞ means the integrated feature of the two kinds of images, S j,A ði, jÞ means the gradient feature of the TOF depth image, and S j,B ði, jÞ means the gradient feature of the high-speed visual sensor image [16].
The fused high-and low-frequency coefficients are transformed by wavelet multiscale inverse transform to obtain the fused athlete motion image.

Snowboarding Motion Pose Solution.
Based on the fused image, the pose is solved. The solution process is shown in Figure 1. Attitude solution is based on Euler angle.

Wireless Communications and Mobile Computing
According to Figure 1. The TOF camera image calibrated by the camera and the high-speed vision sensor image are first input into the distortion correction module for image correction, then the image is binarized, the gray value of image pixels is set, and the image with black-and-white effect is obtained. The image data is transmitted to the target recognition module to recognize it, extract the most table and depth value of the target center, then calculate the three-dimensional coordinates and relative distance matrix of the image target center, and finally enter the target matching. When the matching logarithm is greater than 3, the position can be solved. If it is less than 3, the three-dimensional coordinates and relative distance matrix of the image center need to be recalculated. After the image pose is solved, the next image can be solved. Pose solution is the key and core. Pose solution is to calculate the pose data of the target [17,18]. Suppose the same point f is defined as f in the 3D coordinate of the object coordinate system Oc-XcYcZc, and the 3D coordinate in the camera coordinate system Oo-XoYoZo is defined as g where R represents the rotation matrix and V represents the translation matrix.
It is transformed into the following form and then optimized to solve the rotation matrix and translation matrix.
where f ′ is the optimized solution, f i is the coordinate value of the target i in Oc-XcYcZc, g i is the coordinate value of the same target i in Oo-XoYoZo, and w i is the weight of the target i in calculation [19].
where f is the weight center of the target in Oc-XcYcZc, g is the weight center of the target in Oo-XoYoZo, w i is the weight of target i in calculation, f i is the coordinate value of the target i in Oc-XcYcZc, g i is the coordinate value of the same target i in Oo-XoYoZo, i is the initial value of the summation calculation, i = 1, and n is the end value of the summation calculation.
Find the center vector of the target in two coordinate systems.
where i = 1, 2, ⋯, n, α i and β i are the central vector of the target in Oc-XcYcZc and Oo-XoYoZo, f i is the coordinate value of the target i in Oc-XcYcZc, f is the weight center of the target in Oc-XcYcZc, g i is the coordinate value of the same target i in Oo-XoYoZo, and g is the weight center of the target in Oo-XoYoZo.
This forms the center vector matrix α = ðα 1 , α 2 , ⋯, α n Þ and β = ðβ 1 , β 2 , ⋯, β n Þ. This calculates the scale factor S. The calculation formula is as follows: where W is the diagonal matrix composed of w i and T is the transpose symbol. Decompose the S matrix and then get the rotation matrix R and the translation matrix V.
where ðx, yÞ refers to the pixel coordinate, g refers to the weight center of the target in Oo-XoYoZo, f refers to the weight center of the target in Oc-XcYcZc, R is the rotation matrix, V is the translation matrix, xy T is the rotation matrix of the pixel coordinate ðx, yÞ, and det ðxy T Þ is the determinant of the transposed matrix of the pixel coordinate ðx, yÞ.
Finally, the rotation matrix R is transformed into Euler angle form to obtain snowboarding motion data.

Establishment and Driving of the Virtual Character
Model. The establishment of the virtual human model is to establish a moving three-dimensional human model. On this basis, combined with the calculated snowboarding motion data, the synchronous driving of virtual character motion can be realized. The specific process is divided into the following three steps:

Wireless Communications and Mobile Computing
Step 1. Establish a human skeleton model. Human movement is mainly driven by bones. The human skeleton structure can be simplified into 16 main bones, and all movement postures can be reflected on these 16 bones, as shown in Figure 2.
Step 2. Skin the bones. In order to ensure the fidelity of the established athlete virtual model, a layer of human-like skin is covered outside the skeleton to improve the threedimensional of the model.
Step 3. Fit the snowboarding motion data obtained in the previous chapter with the manikin.
Step 4. Judge whether the fitting curve is close to the motion trajectory. If it is close, the output bone changes continuously; otherwise, refit.
Step 5. Drive the virtual model to complete a series of skiing motions.

Experimental Environment and Dataset
. PKU MMD is a large long sequence multimodal and multiview dataset released in 2017. The images in the dataset have the characteristics of multiview, including 1076 untrimmed long sequence albums and 20000 trimmed samples; miniImageNet is a subset of ImageNet dataset, which contains more than 100 categories; each category contains 600 pictures. At present, it is widely used in small sample simulation experiments. In this paper, PKU MMD and miniImageNet datasets are selected as the datasets. The training parameters of snowboarder's skiing motion are shown in Table 1.
There are many snowboarding motions, among which the air grabbing is the most typical one. Athletes need to complete technical movements such as turning and somersault with a veneer. This technology has a large range of motion, has high requirements for the mastery of athletes' skill level, and is also one of the necessary items for athletes' training. Based on this, taking the skiing movement as the research object, the motion sequence images are collected by the visual acquisition system, the acquisition frequency is 0.20 frames/s, a total of 100 sequence images are collected, and the time is 20 s.
The TOF camera in the vision acquisition system is the Intel RealSense l515 radar TOF camera depth realistic camera, which can be used under various indoor and outdoor lighting conditions, and the image resolution can reach 1280 × 720 30 fps, the maximum working range is 0.4~20 m, and the accuracy error is less than 2.7%. At the same time, it is equipped with IMU to realize the automatic monitoring of the moving route of the object. The high-speed vision sensor is imx382, which can detect and track the moving target at the speed of 1000 frames per second, so as to avoid the problem of missing image information due to the moving speed of the target during image capture. The algorithm is used for motion detail comparison, and the process is shown in Figure 3.
The pose solution results are given to the established athlete virtual model to drive the model to move synchronously, and then, the consistency between the athlete's actual motion and the virtual character model motion is compared.
Using the above research method, a series of motion poses in snowboarding are calculated, and the Euler angles of each bone node are obtained, and then, the line diagram is shown in Figure 4.
The standard motion line image in Figure 4 and the motion line image presented by the virtual character model are segmented at equal time intervals and then analyzed according to the experimental indicators.

Experimental Indicators.
There are two main evaluation indexes for skiing motion comparison, and the consistency of motion capture is comprehensively analyzed from these two indexes. These two items are analyzed in detail in the following: (1) Motion complexity The more feature points, the more complex the motion is. By comparing the key points between the virtual character model and the standard motion, the execution results of the two groups of skiing motions are judged.
where M v represents the accuracy of motion extraction and C N ðrÞ represents the accuracy of motion extraction.
(2) Matching proportion Matching proportion refers to the ratio between the matching results of two groups of motion key points and the number of standard motion features. The greater the ratio, the higher the matching degree.
(3) Motion accuracy: obtain motion accuracy by comparing the number of frames (4) Motion delay rate: obtain the motion delay of each system by comparing video animation (5) Motion loss rate: obtain the loss of motion execution by comparing each image 4.3. Results and Discussion. In order to verify the effectiveness of snowboarding athletes' skiing motion real-time capture based on a 3D vision sensor, experimental analysis was carried out. Under the same test conditions, carry out motion capture according to the four methods given in literature [4] to literature [1], and then carry out fine motion comparison, and compare the comparison results with the studied methods. It is shown in Table 2. According to Table 2, compared with the other five methods, the difference between the key points of 100 frame standard motion sequence images and 100 frame virtual character motion sequence images is the smallest, and the matching proportion with the standard motion sequence image is larger. This shows that the skiing motion displayed by the