Human Motion Capture Based on Incremental Dimension Reduction and Projection Position Optimization

Three-dimensional (3D) human motion capture is a hot researching topic at present. The network becomes advanced nowadays, the appearance of 3D human motion is indispensable in the multimedia works, such as image, video, and game. 3D human motion plays an important role in the publication and expression of all kinds of medium. How to capture the 3D human motion is the key technology of multimedia product. Therefore, a new algorithm called incremental dimension reduction and projection position optimization (IDRPPO) is proposed in this paper. This algorithm can help to learn sparse 3D human motion samples and generate the new ones. Thus, it can provide the technique for making 3D character animation. By taking advantage of the Gaussian incremental dimension reduction model (GIDRM) and projection position optimization, the proposed algorithm can learn the existing samples and establish the relevant mapping between the low dimensional (LD) data and the high dimensional (HD) data. Finally, the missing frames of input 3D human motion and the other type of 3D human motion can be generated by the IDRPPO.


Introduction
Three-dimensional (3D) human motion capture is applied for many fields, such as medical diagnosis, animation making, and 3D video game development [1][2][3]. How to generate the human motion in 3D becomes curial to these works. Human motion in 3D is depicted by high-dimensional (HD) data, and the motion sequence consists of poses. Each pose can be exhibited by a human motion model. One complete motion cycle is called a gait.
3D human motion capture has been developed into a hot researching topic. How to generate the human motion in 3D has various techniques. One of the hot techniques is the reconstruction of 3D human motion from the image sequence, which needs some complex preprocessing methods to extract the image feature and analyze feature sample, such as video event analysis [4] and video feature analysis [5]. Another one is 3D human motion estimation of self-supervised learning, which is learning the sparse samples of one type human motion and generating the other type human motion. Then, the method of self-supervised learning will be mainly discussed in this article. The self-supervised learning can be seen as the special case of unsupervised learning to some extent. Recently, there are some methods which contain defects. In [6,7], some heuristic algorithms are used to process the preprocessed image for generating the human motion. It will cost too much time, and the quality of generated human motion is susceptible to preprocessing quality of the image. The accuracy and efficiency are low. Some dimension reduction models [8][9][10][11] can process the human motion efficiently, but the HD data of the human motion can only be visualized by these models in low-dimensional (LD) space. Some improved dimension reduction models [12] have two mappings between LD space and HD space, which can generate the LD data sample for transformation of the HD data sample. These models will do great help to generate human motion, but the other type of human motion cannot be obtained. A certain improved method in [13] is proposed to fit the human motion sequence, which needs to process the LD data in LD space and increases the difficulty of generating the human motion. These methods above also cannot fast obtain one type motion from the other type directly. In summary, how to generate one type motion from the other type directly is not an easy task in a short time. The CNN [14] and its corresponding networks are emerging in the recent years (e.g., Resnet [15], AlexNet [16], VGG [17], SqueezeNet [18], DenseNet [19], and Inception [20]), but these networks working will need much training time, a large amount of datasets, and huge budget of hardware, which will even need the high-level and costly GPU for work. Thus, a new machine learning method need to be proposed, which is suitable for fast making the animation of 3D human character. Meanwhile, the proposed method can generate the new valid train data and corresponding pseudolabel (selfencoded) data (LD data), which can be used to retrain the model and improve the prediction. In general, it can improve the self-supervised learning model. The data sequence seen as a matrix can be processed by the proposed method directly, so that it can promote performance of some frameworks of tracking and estimation to a certain extent, such as selfsupervised seep correlation tracking [21] (self-SDCT). Without the artificial annotation, the proposed method can obtain the new essential samples according to the data requirement of the self-supervised learning model and let the model update the generating mapping for the improvement of tracking or estimating by the help of these samples.
In this paper, the new algorithm (method) called incremental dimension reduction and projection position optimization (IDRPPO) is proposed to address the problems mentioned above. It can generate one type human motion from the other type. In addition, the input motion samples can be incomplete gait. IDRPPO will show the promising performance from the experimental tests of visual effect and error. IDRPPO will take advantage of Gaussian incremental dimension reduction model (GIDRM) [7] and projection position optimization to carry out the self-supervised learning of small-scale samples. GIDRM is similar to the bilinear analysis model of compound rank-k projections [22](CRP). Inspired by CRP, the adoption of GIDRM can process the complex HD data of the 3D human motion and make these HD data visualized and regularized. Firstly, GIDRM can process the matrix directly without the transformation of the vectors, which is conducive to decrease the computation complexity and improve the model flexibility. The matrix can denote the HD sample sequence of human motion or the corresponding LD data sequence. Secondly, GIDRM can provide the LD space for searching and generating the optimal LD data sample, so that the corresponding 3D human motion can be reconstructed by its mappings. The two advantages are essential to the efficiency of IDRPPO for estimating the 3D human motion. Thus, IDRPPO with the GIDRM can learn one type incomplete gait, then the missing frames in incomplete gait and the other type motion can be output perfectly by it. Our contributions are listed as follows: (1) Address the problem of filling the missing frames in the incomplete motion cycle and make the motion cycle complete and smooth (2) Address the problem of generating the other type motion cycle from the origin incomplete motion cycle by the help of the IDRPPO The performance of the IDRPPO will be tested from the experiments, and the results will indicate the IDRPPO can help to achieve the promising visual effect and low estimating error for human motion capture. The technique framework of IDRPPO can be seen in Figure 1. Then, the details of IDRPPO will be discussed in the following sections.

Generation of Human
Motion through IDRPPO

Gaussian Incremental Dimension Reduction Model.
According to the references above, the models can be given as follows [7,12]: reduction by GIDRM LD space is built by GIDRM, the (updated) LD data X I of Y I can be obtained. e (updated) mapping f1 from X I toYIis built.
e LD data of missing poses in the updated X I can generate the missing poses through the updated mapping f 1 . Based on the initial incomplete motion cycle Y I , the generated missing poses can constitute a complete and smooth motion cycle (gait) within Y I ,which can make Y I updated and let Y I become the complete and smooth motion cycle. en, the mapping g from updated Y I to the updated X I can be built.
(Broken) first pose (Broken) last pose IDRPPO can repair the LD data (green ones) of the missing poses (frames) through the projection position optimization. e missing LD data in X I can be set randomly in LD space initially. A er the repairing, X I and mapping f1will be updated.
e updated XIis fixed, then the mapping f 2 from the X I to running motion cycle denoted by Y II can be built by the GIDRM.

Wireless Communications and Mobile Computing
From Equation (2) and Equation (1), HD data sequence can be denoted by 2 are the kernel parameters of K Y , and the other kernel matrix can be denoted by (1) and Equation (2), Y is known; thus, p ðYÞ is constant, and the equivalence of min ð−ln pðX, α, β, W | YÞÞ⇔min ð−ln pðX, α, β, W, YÞÞ can be got. The LD data and corresponding parameters can be obtained as follows: where y~Nðμ Y ðxÞ, σ 2 Y ðxÞIÞ, y ∈ R D , x ∈ R q , the mapping from HD space to LD space can be built as follows: If two or more mappings from LD space to HD space need to be built, Equation (3) can be retrained according to the needs. After building the first mapping, the LD data from the first mapping can be fixed, which can be seen as the initial LD data of the second mapping training. Then, the mapping of the incremental dimension reduction is built as follows: where Φ ∈ R N×Nk is radial basis function, Φ k 1 ,k 2 = ϕðy k 1 , Then, y * ∈ R D denotes the new HD data sample, x * ∈ R D denotes the LD data of y * ∈ R D . If b is known, the mapping from y * to x * can be given as follows: where Φðy * Þ = ½ϕðy * , c 1 Þ, ϕðy * , c 2 Þ, ⋯, ϕðy * , c Nk Þ, then we can get the equation as follows: In Equation (7), e ∈ R N×Nk is the error matrix, let e = ½e 1 , ⋯, e k 1 , ⋯, The equation can be got as follows: Thus, Equation (8) can be written as: According to the properties of least squares,W T e = 0, e TW = 0, we have: . When training, the Nk orthogonal vectors can be replaced; the equation can be got as follows: Equation (11) is equivalent to the equation as follows: In Equation (12), S w = fw 1 , ⋯,w k 2 , ⋯,w Nk g and S w ′ = fw 1 ′ , ⋯,w k 2 ′ , ⋯,w N ′ g both are the sets of orthogonal vectors.
When the tolerance kX − ΦW _ D k 2 /ðN × qÞ < ε 1 , ε 1 > 0 is satisfied, the training can be finished. It means that the vector φ k 2 ′ is selected as few as possible to minimize the variable NK for the satisfaction of the tolerance, so that the mapping training can be finished.

Projection Position
Optimization. The learning of the incomplete gait of human motion needs projection position optimization in the LD space. Let us give some definitions: Prj AB * denotes the projected operation of vector AB * , A is the first known LD data before the missing human motion sequence, B is the last known LD data after the missing human motion sequence, and C i , i = 1, 2, ⋯N miss denotes the LD data of the missing frames. According to Figure 2, we have: After dimension reduction, c in Equation (14) is a preset parameter which denotes the distance between the missing dot and projection dot in Figure 2. The position of missing frames should satisfy Equation (13) and Equation (14); thus, Equation (3) can be trained optimally during the second training. Then, according to Equation (13) and Equation (14), the objective function and gradient function can be got respectively, as follows: From Equation (16), , "•" denotes product of the entry of matrix. The solution of Equation (15) will not be a unique solution, but any of the solutions can keep the relative position of each missing frame in the LD space during training. Thus, the second training can

Wireless Communications and Mobile Computing
(1) Equation (3) can be used to process the Y I which is containing missing frames for dimension reduction; then, X 1 and corresponding training parameters can be obtained (the external and internal iteration numbers of this step are set to S 11 and S 12 , respectively) (2) Adopt the projection position optimization to process X 1 . It is equivalent to minimize Equation (15) by the help of Equation (16) (the iteration number of this step is set to S 21 ) (3) The training parameters in step 1 and X 1 processed in step 2 can be took into Equation (3) for the second training, then the training parameters, the updated X 1 and mapping f 1 from X 1 to Y I can be obtained. The missing frames in the Y I can be generated from X 1 processed in step 2. Build the mapping g from Y I to X 1 through Equation (5) next (the external and internal iteration numbers of building f 1 are set to S 31 and S 32, respectively, the iteration numbers of building g is set to NkðNk ≤ NÞ) (4) Build the mapping f 2 from X 1 to Y II through Equation (3), X 1 is obtained from step 3, and X 1 is fixed during this training. After finishing the training of Equation (3), the mapping f 2 can be obtained (the external and internal iteration numbers of building f 2 are set to S 41 and S 42 , respectively) (5) When there comes y I ' , y II ' can be generated by the equation y II ′ = f 2 ðgðy I ′ÞÞ.
The computational complexity of the whole algorithm is depending on the iteration number of each step usually. The computational complexity is denoted by Oð:Þ, which is mainly described by the time frequency. If the data preprocessing and matrix calculation are without consideration, as the result of which are not the core steps of proposed algorithm, we can get the computational complexity is OðS 11 S 12 + S 21 + S 31 S 32 + Nk + S 41 S 42 Þ. Thus, the computational complexity is depending on each iteration number which can reach the max iterative magnitude.

Experiment and Evaluation
Some heuristic algorithms and dimension reduction models cannot generate one type human motion from the other type mostly. How to optimize the projection position is the key to the generation of human motion. Thus, the algorithm using incremental dimension reduction with no projection position optimization can be called IDRNPPO. IDRNPPO and IDRPPO will be used to generate the human motion for the experimental tests. In the experiments, the visual effect and error from the missing frames and generated poses will be the evaluation criterion of the performance. The missing frames can adopt the walking motion, and the generated motion can adopt running motion which will be generated by the walking motion.  Figure 3, the human running poses from IDRPPO are better than the ones from IDRNPPO in the visual effect. The 30th, 35th, 40th, 45th, 48th, 52nd, and 58th frames from the IDRNPPO are the same, which cannot constitute the smooth motion sequence to show the running process. Furthermore, from Figure 4, the missing frames in the input motion from IDRNPPO are also the same, which cannot display the missing smooth walking sequence. However, the running motion and the missing walking motion from the        Figure 5, the LD data of missing frames from IDRNPPO and IDRPPO are obviously different, which are denoted by the green ones in Figure 5(a) and Figure 5(b), respectively. The ones of IDRNPPO are without projection position optimization. They are becoming a mess carve, which are difficult to be distinguished. On the contrary, the ones of IDRPPO are very neat and smooth, which can constitute the missing part from the whole curve. The results of Figure 5 can also explain why the missing frames of IDRPPO will be the smooth motion sequence in another aspect. On the whole, Figures 3, 4, and 5 can indicate IDRPPO has better performance than IDRNPPO.
3.2. The Error of the Generation. The IDRPPO and IDRNPPO can be seen in Figure 6, respectively. How to calculate error can be seen in [24]. From Figure 6, the errors of the human running motion and the missing walking motion from IDRPPO are lower than IDRNPPO on the whole. It is the normal phenomenon that some frames of both have the close error in Figure 6(a), because some frames of IDRNPPO can display the running motion correctly. However, the tendency of errors can be evaluated by mean error. The mean error from IDRPPO is lower than IDRNPPO as depicted in Figure 6(b). From Table 1, it can be found that the runtime testing results are 8.28 seconds (IDRNPPO) and 9.51 seconds (IDRPPO), respectively. The small gap of the required running times for both will also be indicated. Finally, the results of Figure 6 can illustrate the IDRPPO performance of generating the motion is better than the IDRNPPO again.

Conclusion
The IDRPPO is proposed to obtain the 3D human motion. IDRPPO with the GIDRM can help to learn the incomplete gait, and generate the other gait, which makes up the defects of some self-supervised or unsupervised algorithms. From the experiments, the projection position is crucial to the performance of IDRPPO. The experimental results can reveal IDRPPO is efficacious in making 3D human character animation, which can do great help to generating the motion cycle fast. IDRPPO can promote the small-scale self-supervised or unsupervised learning undoubtedly. However, IDRPPO cannot process the complex and irregular human motion samples, which will be improved in the future research. The human motion model can be replaced by a more advantaged model [25], so that the high-level multimedia product can be made by this technique.

Data Availability
The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
The authors declare that there is no conflict of interests regarding the publication of this paper.