A New Kinect-Based Posture Recognition Method in Physical Sports Training Based on Urban Data

Physical data is an important aspect of urban data, which provides a guarantee for the healthy development of smart cities. Students’ physical health evaluation is an important part of school physical education, and postural recognition plays a significant role in physical sports. Traditional posture recognition methods are with low accuracy and high error rate due to the influence of environmental factors. Therefore, we propose a new Kinect-based posture recognition method in a physical sports training system based on urban data. First, Kinect is used to obtain the spatial coordinates of human body joints. Then, the angle is calculated by the two-point method and the body posture library is defined. Finally, angle matching with posture library is used to analyze posture recognition. We adopt this method to automatically test the effect of physical sports training, and it can be applied to the pull-up of students’ sports. The position of the crossbar is determined according to the depth sensor information, and the position of the mandible is determined by using bone tracking. The bending degree of the arm is determined through the three key joints of the arm. The distance from the jaw to the bar and the length of the arm are used to score and count the movements. Meanwhile, the user can adjust his position by playing back the action video and scoring, so as to achieve a better training effect.


Introduction
Urban big data is a massive amount of dynamic and static data generated from the subjects and objects including various urban facilities, organizations, and individuals, which have been collected and collated by city governments, public institutions, enterprises, and individuals using a new generation information technologies. Big data can be shared, integrated, analyzed, and mined to give people a deeper understanding of the status of urban operations and help them make more informed decisions on urban administration with a more scientific approach, thereby, optimizing the allocation of urban resources, reducing the operating costs of the urban system, and promoting the safe, efficient, green, harmonious, and intelligent development of the cities as a whole.
Nowadays, the sports hardware facilities are also perfect with the development of the smart cities [1][2][3]. The quality of people's life has been guaranteed and improved. The human body has a rich and variety of movements [4]. In many applications, a more comprehensive analysis for human movement is needed, such as behavior monitoring, movement analysis, and medical rehabilitation. If the human body can be identified and tracked in real time, then the posture of the human body can be identified accurately, this process can make it more convenient to observe and learn human behavior [5]. Therefore, it is necessary to find a good way to recognize human posture.
In recent years, human motion recognition based on Kinect has shown great significance in the field of medicine, and many institutions are carrying out relevant researches. Enea et al. [6] installed a Kinect on a walker to extract information about the legs for medical gait analysis. Chen et al. [7] used depth information extracted from Kinect to detect the body's joints and used a random forest classifier to classify depth image pixels into multiple parts of the body. Thang et al. [8] adopted human anatomy marks and the human body skeleton model to obtain a depth map and estimate the posture of the human body. The geodesic distance was used to measure the distance between the different parts of the body. Xu et al. [9] used the Kinect sensor to obtain human body images and identify 3D human body posture. Yang et al. [10] utilized a Kinect device to capture the scene and estimate the body's limb posture. Moreover, some other researchers had done the related works to improve the posture recognition [11][12][13]. These methods only take physical characteristics into consideration; it ignores the global features, which shows a poor effect on physical sports training. Although deep learning methods have been applied to many fields, it is not mature in physical training and rehabilitation. In this paper, we focus on the study of improving the Kinect technology for posture recognition.
Many medical experts have brought Kinect to medical rehabilitation because it is cheap and useful such as using Kinect for rehabilitation treatment. The basic idea is to use depth information and skeleton tracking technology to track the body limb and determine the position of the body. At the end, it can identify the movement of the body. In reference [14], Kinect rehabilitation training could effectively enhance the quality of rehabilitation. It not only assisted patients to recover motor function but also improved their psychological quality and reduced their negative emotions. Wang et al. [15] designed a rehabilitation system for the human shoulder that required the patient to touch his or her hand to a set point. However, it was impossible to measure the specific location of the joint in real time. Rehabilitation training usually does not require patients to carry out rapid and large movements, but requires high tracking accuracy of Kinect's human skeleton. If accurate tracking of the human hand and leg skeleton can be observed, then, the movements of patients can be identified more accurately, so as to achieve better recovery effect.
Our main contributions are as follows: The remainder of the paper is organized as follows. In Section 2, we give an outline of the Kinect imaging principle. Section 3 describes the posture recognition method in detail.
The performance and robustness are evaluated in Section 4. The conclusion is drawn in Section 5.

Kinect Imaging Principle
Kinect emits a near-infrared light to get a depth map. Kinect actively tracks large objects regardless of the amount of light. The depth image of Kinect is captured by an infrared projector and camera. The projection and reception are overlapped. There are similar processes for transmitting, capturing, and computing visual representations.
Structural light is with specific patterns such as points, lines, and surfaces. The principle of depth image acquisition based on structured light is to project the structured light into the scene and the image sensor to obtain the pattern corresponding to the structured light [16]. Because the structure light will be changed according to the shape of the object and the depth information of each point in the image can be calculated by using the triangle principle and the obtained pattern.
The Microsoft Kinect depth concept uses the light coding technology, which is different from the traditional twodimensional pattern projection method of structured light. Kinect's light-coding infrared transmitter emits threedimensional depth coding. Laser speckle is the source of light coding and the result of diffuse reflection of the laser. That is to form random spots, and the spots are not the same anywhere in the space. So every time the light source is labeled and all random spots are saved. When an object is placed in this area, if a random speckle of the object's appearance is obtained, the object's position can be found. Thus, it can obtain a depth image of the scene.

Coordinate Transformation
Camera space refers to the 3D spatial coordinates used by the Kinect. The origin coordinate (x = 0, y = 0, and z = 0) is in the center of the Kinect's infrared camera. The x-axis is to the left of the Kinect irradiation direction. The y-axis is the upward direction along the Kinect irradiation direction. The z-axis is along the Kinect irradiation direction.
The coordinate system of the depth image is the origin of an infrared camera. The positive x-axis and y-axis directions are horizontal to the right and vertical downward, respectively. The z-axis is the spatial coordinate system of the camera axis direction and meets the right-hand spiral criterion, and its type is DepthImagePoint (x, y, and z). The point type of the bone tracking coordinate system is SkeletonPoint (x, y, and z). Where (x, y, and z) is the spatial coordinate system. Only the x and y values are used in the 3D coordinates of the bone joint points. The z value is related to the distance from the object to the Kinect. If the z value is smaller and the distance is closer, then, the bone image is larger. Because Kinect does not use the same camera to collect depth images and color images, because the corresponding coordinate systems are different, so they need to be transformed into coordinate systems. The KinectSensor the SDK provides MapDepthToSkeletonPoint, MapSkeleton-PointToDepth, MapDepthToColorlmagePoint, and MapSkele-tonPointToColor to transform coordinate system.

Proposed Posture Recognition Method
The Kinect camera integrates devices such as infrared transmitter, RGB camera, and infrared receiver, as shown in Figure 1. The best working range is 1.2-3.5 m, the horizontal angle of the RGB camera is 57°, the vertical angle is 43°, and the shooting frequency is 30 Hz, which can ensure high accuracy in a fast scanning moment.
The human posture recognition algorithm is mainly composed of skeleton acquisition, angle measurement, angle matching, and posture recognition. The flow of the algorithm is shown in Figure 2. Firstly, the human skeleton is obtained and the spatial coordinates of the skeleton joints are calculated. Then, it calculates the distance between the joints and the angle between the joints. Finally, the calculated angle is matched with the angle template in the posture library to evaluate the posture recognition.
Kinect can provide the three-dimensional coordinates with 20 bone joints of the human body. Figure 3 is the skeleton diagram of the human body.

Calculating the Distance between the Joints
In the above analysis, 20 key points of the human body have been obtained. Next, the distance between the two key joints is computed. Firstly, the scene depth information obtained by Kinect is used to calculate the actual distance between the person and the camera. In reference [17], the obtained depth value was used to calculate the actual distance d from the target to the Kinect sensor, i.e., where d raw is the depth value. H = 3:4 × 10 −4 rad, K = 12:35 cm, L = 1:18rad, and O = 3:7cm. The transformation formula from pixel coordinate (x image , y image , andz image ) of depth image to actual coordinate (x real , y real , andz real ) is where D ′ = −10, and F = 0:0021 according to the abundant experiments. The resolution of Kinect is w × h = 640 × 480. Xðx 1 , x 2 , x 3 Þ and Yðy 1 , y 2 , y 3 Þ are two points in the spatial coordinate system. Combining equations (1) and (2), the actual coordinate of the joint can be obtained. Then, we use the Euclidean distance to get the distance between the joints.
6. Calculating the Angle The three-point method (three joints) is mainly used to solve the angles between human body connection points. The coordinates of the actual position of the key nodes calculated by the formula (2) are used to calculate the distance of the three key nodes related to the human body as shown in Figure 4. By using the cosine law (equation (5)), the angles between the connection points are calculated. The main disadvantage of this method in the recognition of human posture is that the instability of the closed joint has a great impact on the angle measurement during the measurement process resulting in inaccurate posture recognition. Figure 5(a) shows the angle measurement effect of the three-point method.
Angle control motor

Microphone
Infrared transmitter RGB camera Infrared receiver

Posture Definition
Equation (6) is used to define the angle condition of the joint.
So it is centered on P 1 . The angle between the joint P 2 and the x-axis is θ. τ is the set angle threshold value. The definition of more postures only needs to determine the angle relationship between the joints, and different thresholds can be set to meet different precision requirements. Set θ i ði = 1, 2, 3, 4Þ as the angle of the joint.

Human Body Posture Matching
In this paper, the threshold range of angle is set up when building a posture library. First, it traverses all the angles, then, it determines if the four angles are within the specified threshold. If YES, then the posture matching is successful. That is, all angles satisfy equation (7). If one of the angles is not satisfied, the matching is not successful and the match is resumed.
where θ i is the measured angle, a i is the set expected angle, and T is the threshold value.

Experiments and Analysis
In this section, the algorithm tests seven actions in physical education class from Shenyang Normal University. It can be seen from   Wireless Communications and Mobile Computing completing the low link, we can enter the high link. If this link is completed, and the next link does not meet the requirements, it will start from the first link again.
In this section, we test student sports pull-ups using our method. Kinect's core technology is bone tracking, which allows the device to better capture human motion and extract deep information. Microsoft Kinect adopts the depth measurement technology of Light Coding. To obtain the spatial position of the key joints, the human body in the depth image needs to be segmented through the machine learning method. Finally, the depth image is transformed into the bone image [18][19][20].
The conversion process from depth image to bone image requires three steps: body recognition, body part recognition, and joint recognition. The Kinect SDK can track the 3D coordinates of the 25 bone points for 30 frames in real time. Kinect 2.0 can identify the location of six people at the same time and give complete skeletal information of two people at the same time. With each joint of the three states: TRACKED, NOT TRACKED, and INFERRED can get a complete skeleton of the human body connected by 25 nodes.
The scoring module firstly extracts the user's motion information by using the Kinect module and then extracts the angle features and position information of the data [21,22]. Since there is no time limit during the pull-up test, the score of each movement is calculated by presetting the scoring criteria and rules to judge whether the user's arm is straight and the position relationship between the jaw and the bar. There are two indicators to evaluate the pull-up, namely, the distance between the lower jaw and the bar when the body is at the highest point and the bending angle of the arm at the end of the movement.
By calculating the distance difference between the height (hi i ) of the mandible and the height of the actual crossbar (Δhi i ) when the human body is at the highest point, we can calculate the score Ascore i , namely, where δ i is the variable used to determine the threshold of scoring within each distance interval. It is similar to calculate the bending angle of the arm and the distance between the lower jaw and the bar. The score Bscore i is set by measuring the angle difference ΔAngle i between the bending angle of the i-th elbow joint and the angle at full extension, and the angle of the left and right elbow is averaged to set the score Bscore i . The relation between the two angles can be expressed by the function f ðxÞ, namely, Since in the actual test, it is impossible to ask everyone to fully extend their arms, so we let the function f ðxÞ be a segmented function to set a certain threshold range. In this system According to the national physical test standards, it is necessary to take the distance between the lower jaw and the crossbar and the straightening degree of the arm into comprehensive consideration to score the pull-up. Therefore, the weighted value of both should be taken to obtain the final score, i.e., where α and β are the weight coefficients of Ascore i and B score i , respectively. The weights of the two indexes can be changed by selecting different coefficients. In this system, α = β = 1. This project counts and scores according to the position relationship between the user's jaw and the crossbar and the bending angle of the arm. When the lower jaw crossing the bar is detected and the arm bending angle is within a certain threshold range then a count is made and the corresponding score is given. Other cases are not counted, and the corresponding score will be given. The following observations provide the counting and scoring results in several situations (The full score is 10).
The system detects that when the user's body is at the highest point, the lower jaw crosses the crossbar, the bending angle of the left arm is 172°, and the bending angle of the right arm is 163°. It counts once and determines the scoring interval according to the bending angle of the arm. We synthesize the Ascore i and Bscore i , it obtains ten points as shown in Figure 6.
The system detects that when the user's body is at the highest point, the lower jaw crosses the crossbar, the bending angle of the left arm is 151°, and the bending angle of the right arm is 148°. It does not count and determines the scor-ing interval according to the bending angle of the arm. We synthesize the Ascore i and Bscore i , it obtains five points as shown in Figure 7.
The system detects that when the user's body is at the highest point, the lower jaw does not cross the crossbar, the bending angle of the left arm is 171°, and the bending angle of the right arm is 160°. It does not count and determines the scoring interval according to the bending angle of the arm. We synthesize the Ascore i and Bscore i , it obtains six points as shown in Figure 8.
The system detects that when the user's body is at the highest point, the lower jaw does not cross the crossbar, the bending angle of the left arm is 158°, and the bending angle of the right arm is 161°. It does not count and determines the scoring interval according to the bending angle of the arm. We synthesize the Ascore i and Bscore i , it obtains two points as shown in Figure 9.
Under natural conditions in the laboratory, 100 groups of experimental data are selected for the experiment. In this paper, the accuracy and real time of body recognition are tested, respectively. The experimental results are shown in Tables 2 and 3. It can be seen from the table that the recognition accuracy of the proposed method in this paper is over 88%.
This proposed method is compared with the other three body recognition methods containing DTW [23], IKS [24], and ConvNets [25]. Indicators include accuracy and time. Accuracy refers to the proportion of the correct sample in the total test samples.

Conclusions
This algorithm is developed by combining Microsoft Visual Studio 2010 with Microsoft Kinect SDK 1.7. The experiment shows that this method can measure the angle between the skeleton in real time and identify the posture of the human body accurately. The algorithm is simple and accurate. Moreover, different angle ranges can be set according to the requirements of different postures, so the reusability is  Wireless Communications and Mobile Computing   7 Wireless Communications and Mobile Computing strong. Although the Kinect sensor can obtain the depth information of the human body and calculate the spatial position of the human body, it is not accurate enough to identify such problems as the coincidence of the joints. Therefore, while paying attention to the development of human behavior analysis, we should study problems such as skeleton correction to further improve the accuracy of skeleton. In the future, we will adopt deep learning and artificial intelligence method to perfect the quality of physical for all national persons.

Data Availability
The data used to support the findings of this study are available from the corresponding author upon request.