Human Motion Estimation Based on Low Dimensional Space Incremental Learning

This paper proposes a novel algorithm called low dimensional space incremental learning (LDSIL) to estimate the humanmotion in 3D from the silhouettes of human motion multiview images. The proposed algorithm takes the advantage of stochastic extremum memory adaptive searching (SEMAS) and incremental probabilistic dimension reduction model (IPDRM) to collect new high dimensional data samples. The high dimensional data samples can be selected to update the mapping from low dimensional space to high dimensional space, so that incremental learning can be achieved to estimate human motion from small amount of samples. Compared with three traditional algorithms, the proposed algorithm can make human motion estimation achieve a good performance in disambiguating silhouettes, overcoming the transient occlusion, and reducing estimation error.


Introduction
Human motion estimation has become a hot researching topic [1][2][3], but it is a challenging task.Unusually, we are very interested in estimating the human motion in 3D from the silhouettes of human motion multiview images.The challenges are mainly as follows: firstly, it is hard to build the mapping between multiview silhouettes and human motion in 3D; secondly, the matching is ambiguous between multiview silhouettes and human motion in 3D; finally, it is hard to determine the spatial position information of the human motion depicted in the multiview images.In the past few years, a number of algorithms have been proposed to estimate the human motion.In the works of Sigal et al. [4] and Deutscher and Reid [5], they use some improved particle filters to estimate the human motion in 3D.It cannot work well because of searching (sampling) in high dimensional (HD) space for many times.There are some problems if we directly search in HD space, for example, searching in large scale will get the invalid data and searching in small scale will not get the target data.Moreover, if searching many times in small scale, it also brings about the invalid data.Li et al. use principal component analysis (PCA) to reduce the dimension of the HD data samples and simulated annealing particle swarm optimism (SAPSO) to estimate the human motion in 3D [6].This algorithm is time-consuming and its performance is not so well because the HD data converted from corresponding low dimensional (LD) data will be quite different from original HD data.Besides, it does not consider the spatial position information of the human motion.Some traditional Monte Carlo methods [7][8][9] have drawbacks, which can not ensure collecting the best sample each time during searching, thus stochastic extremum memory adaptive searching (SEMAS) is proposed to solve this problem.In the work of Wang et al. [10], Gaussian process dynamical model (GPDM) can be used to reduce the dimension of the HD data to acquire the corresponding LD data and build the mapping from LD space to HD space, but GPDM can not quickly reduce the dimension of the new HD data sample and acquire the new corresponding LD data; thus incremental probabilistic dimension reduction model (IPDRM) is proposed to solve this problem based on GPDM.Some improved incremental or nonincremental learning algorithms in [11][12][13][14] can not satisfy our need.

Mathematical Problems in Engineering
The limitations are that output data denoting the class label or other simple information have only one dimension, which can not satisfy the description of some output data, and they can not carry out the unsupervised incremental learning of HD data.Inspired by the researches stated above, the key to estimate the human motion in 3D depends on generating the better prior information.The human motion in 3D can be estimated more accurately through searching around the better prior information in small scale only once.In this paper, we mainly focus on the regular human motion cycle (walking or running).
Our task is how to use the small amount of samples of HD data as the prior information to estimate the human motion in 3D which matches the multiview images.Based on the works of the researches above, we proposed a novel algorithm called low dimensional space incremental learning (LDSIL).The LDSIL mainly carries out through SEMAS and IPDRM to collect the new HD samples and updates the mapping from the LD space to HD space through the selection of new HD samples, thus the searching in the LD space can generate the more accurate HD data to estimate the corresponding human motion in 3D.Then, SEMAS is used to find the spatial position of human motion in 3D, and it can find the best data sample during searching more easily.IPDRM is used to reduce the dimension of the new HD data sample and acquire the new corresponding LD data, and it can help to select the new HD samples through the mapping of incremental dimension reduction.Moreover, it provides the LD space to generate the valid HD data.Based on IPDRM, the method of selecting the HD data samples for incremental learning can be achieved by comparing corresponding LD data.
The main contribution of this paper is listed as follows: (1) SEMAS is proposed to find the spatial position of human motion model.It can get the best data sample better than the traditional Monte Carlo methods.
(2) IPDRM is proposed to reduce the dimension of the new HD data sample and acquire the new corresponding LD data.It can promote the incremental learning in LD space to update the mapping from the LD space to HD space.Besides, it provides the LD space to generate valid HD data through searching.
(3) The method of selecting the HD data samples is proposed; it is used to update the mapping from LD space to HD space.
Overall, due to LDSIL being able to make use of LD data, it can solve these problems mentioned above and contribute a lot to estimating the human motion in 3D, which has the better performance than other traditional algorithms, including disambiguating silhouettes, overcoming the transient occlusion, and reducing estimation error.
The rest of this paper is organized as follows: Section 2 introduces the corresponding data and models, and they are used to estimate the human motion.Section 3 proposes SEMAS algorithm to find the spatial position of human motion model.Section 4 proposes IPDRM to achieve the incremental dimension reduction.Section 5 proposes orthogonal least squares learning of multiple outputs (OLSLMO) to learn the mapping (HD space to LD space and LD space to HD space).Section 6 discusses LDSIL based on Sections 2-5: this section mentions how to select the new HD data sample from the estimated human motion models to achieve the incremental learning in the LD space.Section 7 proposes the method of searching in LD space to estimate the human motion model; the method is taking the advantage of the SEMAS and IPDRM.Section 8 shows the validity of proposed algorithm (LDSIL) through the experiments and evaluations.Section 9 discusses the limitation of LDSIL algorithm and the improvement in the future.
Let us give the more detailed discussion in the following sections.

Corresponding Data and Models
We introduce the corresponding data and models in the works [4,5].All image data can be found in HumanEva-I dataset [4], as shown in Figures 1(a)-1(d).Figure 1(a) shows the human motion model denoting the human motion in 3D, which is described by HD data.The model is our estimated object, which need match the limbs in the multiview images.Figure 1(b) shows the multiview images, which are depicting human motion and its spatial position.After using some image segmentation algorithms [15][16][17] to process the multiview images, we can get the silhouettes as shown in Figure 1(c).Then, we project the model to the corresponding views and obtain the projection images in Figure 1(d).The images in Figure 1(d) are used to compare with the images in Figure 1(c).
In the following, we give some definitions for the abovementioned data and models.The equation is built as below: where  denotes the th frame,  denotes the th view of camera, V is the number of views,  denotes the weight coefficient,  denotes the pixel of the image,   Mathematical Problems in Engineering denotes the th view feature of object image, m t denotes multiview features of object image,  y   ∈   1 and up y   ∈   2 , respectively, denote the HD data samples which contain spatial position information (6 dimensions in total) and no spatial position information,  1 −  2 = 6, and thus (   |  y   ) is the conditional probability of matching image feature of the th view from the appearance of  y   .Usually, we let  = 100 and  = 0.5 in most cases.After dimension reduction, the corresponding LD data of the HD data samples with no spatial position information can be seen in Figure 1(e).The HD data samples are the small amount of samples which cover one cycle of human motion.
For the better description of our proposed algorithm in the following section, we need to define the symbols of some operations as follows.y( 1 :  2 ) is extracting the  1 th- 2 th dimension data of vector y as a subvector; 1 A and 1 B are matrices or vectors, ones( 1 ,  2 ) ∈   1 × 2 is the matrix whose elements are 1; rand( 1 ,  2 ) ∈   1 × 2 is the matrix whose elements are conforming the distribution (−1, 1); the set can be described by is the calculation of 2-norm (Euclidean-norm), and  can denote the vector or matrix.

Stochastic Extremum Memory Adaptive Searching
We will propose SEMAS algorithm in this section, which is used to find the spatial position of human motion model.The spatial position information includes the angles and coordinates of root marker in the model, which is the low dimensional data.In time  (the th frame), let  y   ( 1 :  2 ) denote the spatial position information of  y   ∈   1 , the set   = { 1 ,  1 + 1, . . .,  2 } =  1 :  2 ,   ∪   = {1, 2, . . .,  1 }, and y   =  y   ( 1 :  2 ) and y  =  y   (  ); we get where the SEMAS can be carried out as follows.We denote   1 as the ordinal number of maximum element in W  = { Here, the extremum needs to be kept, let +1 y    =  y   1  , and, in the next time searching, it is recording the extremum for ensuring obtaining the data samples which is not worse than the ones in the last time searching.In each searching, we can get   data samples.After searching twice, we can adjust the value of  according to the equation as below: where when the subtraction of the best object values in the past two times searching is below some value,  can be enlarged to search better data samples; on the contrary, if the subtraction is above some value, it illustrates that the value of  can be shrunk to avoid missing the best data sample.Thus, the method can make the convergence faster.
After  times searching ( is large enough), when   (y   ) is unimodal approximately, the best sample y *  can be obtained by and the derivation of (4) and the pseudocode can be seen in the appendices.
Then, we compare and analyze the performances between traditional Monte Carlo method and SEMAS as depicted in Figures 2(a) and 2(b).In Figure 2, we take 4 times searching and 3 data samples each time for example.Figure 2(a) shows that the traditional Monte Carlo method can not ensure that the searched data samples are not worse than the last time, which keeps the larger weight data samples simple.The reason is that it does not keep the best data sample each time during searching, which can not be compared with the data samples searched in the next time; moreover, the length and direction of searching vector are stochastic and uncontrolled.After several times searching, in most cases, it may appear that all of the searched data samples can not close the  *  = max(  ) better, and the mean   = ∑ w   y   , w  =   (  y   )/ ∑   (  y   ) can not reach  *  also.As shown in Figure 2(b), the SEMAS will keep the best data sample in each time searching, which will be compared with the searched data samples in the next time searching.It can ensure that the searched data samples can not be worse than the last time searching and can adjust the length of searching vector to control the searching scale, according to subtraction of the two best values of objective function   in the past twice searching; thus it will close the  *  = max(  ) better and has more chance to get the better data samples than traditional Monte Carlo method.

Incremental Probabilistic Dimension Reduction Model
4.1.Probabilistic Models of GPDM.We give the probabilistic models of the GPDM in [10] as follows: In (5), HD data sequence can be denoted by Y = [y 1 , . . ., y  , . . ., y  ]  ∈  × , y  ∈   , and LD data sequence can be denoted by x  ]  , and x 1 conforms Gaussian distribution of  dimensions. and  satisfy () ∝ ∏   −1  and () ∝ ∏   −1  , respectively.Then, we give the conditional distribution as follows: where 1) , and Then, the LD data and corresponding parameters can be calculated as below: Optimization of ( 9) can use the method of scaled conjugate gradient (SCG) [18,19].After the optimization, X can be depicted as shown in Figure 1(e).According to (7), , the mapping from LD data to HD data can be given by the mean as below:

The Mapping of Incremental Dimension Reduction.
After finishing training, GPDM can not process the new HD data sample, which means it can not acquire the corresponding LD data.From (10), y * is known, the x * is embedded in nonlinear kernel, and thus x * is hard to be solved through this equation.Thus, we need to build the mapping from HD data to LD data and acquire the LD data of new HD data sample fast.We denote HD data samples not containing spatial position information as Ỹ = [ up ỹ1 , . . ., up ỹ , . . ., up ỹ up ]  ∈   up × 2 ; then we denote the corresponding LD data as X = [x 1 , . . ., x , . . ., x up ]  ∈   up × , which can be acquired by GPDM.The mapping can be built through the training model as below: where W  ∈  × is the weight matrix and Φ ∈  × up is the matrix of Gaussian basis function; thus we have .Then, we have the mapping of incremental dimension reduction as below: where as to one HD new samples up y *  , let  up = 1, Φ( up y *  ) = [( up y *  , c 1 ), ( up y *  , c 2 ), . . ., ( up y *  , c  )]  , and  E =  B  ones(1, 1).The mapping as (12) can acquire the corresponding LD data of the new HD data sample; furthermore, it can acquire the multiple corresponding ones of the multiple HD data samples, which only need to add the corresponding column in Φ and  E. Equations ( 5)-( 12) constitute IPDRM; however, the learning method of ( 11) is the key to estimate the human motion in 3D; thus we will discuss it in the next section.
In the following, we need to discuss the several advantages of IPDRM.Firstly, the model is used to build the LD space through the known HD data samples, acquire the LD data of the new HD data sample, and provide the human motion global prior information in the LD space.The searching position in the LD space can be confirmed to disambiguate silhouettes through clustering.This is because the mapping from HD space to LD space is one-to-one.Secondly, it can reduce the estimation error and overcome the transient occlusion through linear and nonlinear searching in the LD space via the global prior information; moreover, the HD data samples generated through the large scale nonlinear searching between the two neighbours LD data in the LD space are better and more valid than the large scale nonlinear searching between the two neighbours HD data in the HD space directly, which can be tested as shown in Figure 3. Let us see the two experiments (walking and running models); we find that the HD data which are generated through nonlinear searching in the LD space (Figure 3(b)) and converted to human motion models are more like human motion shape than the ones which are generated through the nonlinear searching in the HD space (Figure 3(a)).Finally, searching in the LD space can have lower computation cost than searching in the HD space obviously.

Orthogonal Least Squares Learning of Multiple Outputs
In this section, we propose a method called orthogonal least squares learning of multiple outputs (OLSLMO).In the LD space or HD space, the data are multiple dimensions, for which the feature or model can be described better.Let us denote the model as follows: Here, matrices of the output and input vectors are, respectively,  matrix; A ∈  × is the invertible matrix with 1 on the diagonal given by let least squares (LS) estimator θ = (P  P) −1 P  D,  g = ( W W) −1 W D, and we can give the derivation as below: Equation ( 13) can be written by where and    is corresponding orthogonal vector set from vector set {p  1 , . . ., p    , . . ., p   }.The value of  can be confirmed according to the condition ‖D − P θ‖ 2 /(× 1 ) = ‖D− W g‖ 2 /(× 1 ) = tr(e  e)/(× 1 ) <  1 ,  1 > 0. When this condition is satisfied, we let where R  = ones(, 1) ∈  ×1 and error matrix is e  ∈  × 1 .According to the equation and updating method [20] of LS,  D can denote the LS estimator of D, and we let where if condition ‖D − P θ‖ 2 /( ×  1 ) = ‖D − W g‖ 2 /( ×  1 ) = tr(e  e)/( ×  1 ) <

Low Dimensional Space Incremental Learning
After using low dimensional space searching (LDSS) to estimate some human motion models of corresponding frames in 3D (about one motion cycle), the new HD data samples can be got from the models.The LDSS algorithm is given in the framework of LDSIL shown in Section 7.2.Then, continuing to use the mapping of (10) to generate HD data sample will be less accurate, thus incremental learning is needed to update the mapping from LD space to HD space.Equation ( 10) is so complicated that its parameters can not be updated easily, due to the need of calculation in (9).For this consideration, we still use the model like (11) to build the mapping, which can take advantage of new HD data samples to learn through the method of Algorithm 1.The details of incremental learning will be discussed as follows.
The key of incremental learning is how to select the HD data samples to update the mapping from LD space to HD space.The better new samples used to training the mapping can be selected from the estimated HD data with spatial position information removed.The estimated HD data samples contain errors, which can convert into human motion models of corresponding frames.After among LD data can describe the similarity of HD data, and we need to select the LD data from X * new , which are welldistributed in the LD space for getting the corresponding HD data.The selecting rule can be carried out as below.We denote ordinal number as below: and then, we have and the LD data and the corresponding HD data can be selected by where where .After finishing the training of (26), the incremental learning is completed.Then, the mapping is updated, so that the new HD data can be generated through the updated mapping: The mapping of ( 27) is incremental learn mapping, which replaces (10) of IPDRM; it can promote the second advantage of IPDRM mentioned above, due to the mapping from LD data to HD data being updated to get more accurate.Worthy of attention is that at least one x *  * must distribute between x and x+1 , so that effectiveness of incremental learning can be better, thus using LDSS to generate HD data samples should follow this.

Human Motion Estimation via Searching in
Low Dimensional Space 7.1.The Method of Searching.We estimate the human motion model via searching in low dimensional space.The method of searching needs to combine linear searching and nonlinear searching in order to generate the better corresponding HD data sample, which is depicted as Figure 5.In Figure 5, the hollow dots denote the new LD data acquired by linear searching and nonlinear searching in the LD space; the solid dots denote the known LD data acquired by dimension reduction of known HD data samples.Let us discuss more details as follow.g( ) We take X = [x , , , where ũV

), and ũ𝑠V
= rand(, 1).From ( 29) and (30), the set containing searched LD data can be where the optimal prior information of the pose  y *  ∈   2 can be obtained, and then the optimal prior HD data can be y pri = [ y *   ,  y *   ]  ∈   1 , and  y *  is the optimal spatial position information which can be obtained by SEMAS.
On the basis of work above, we can estimate the human motion model from multiview silhouettes through Bayesian theory.When y pri is known, (  y   |   ) is unknown and    is large enough; we let (  y   ) = (  y   | y = y pri ), and the estimation can be achieved by and, after getting  y *  , the human motion model can be drawn in the space.Worthy of attention is that (  y   | y pri ) is multivariate normal distribution density function, which takes y = y pri as the mean,    is the number of generating  y   , y pri is contained as one of    samples, which can let  y     = y pri , and w is normalized weight.The method of searching before or after incremental learning is also like this; it only needs to replace some variables to be carried out.
The poses of human motion are continuous, which are shown by the human motion models.The pose estimated last time is close to the one of this time, thus the new HD sample up y *  can be seen as the pose estimated last time, and the corresponding LD data x *  can be acquired by IPDRM.Then, we find the other LD data close to x *  through clustering to confirm the searching position in the LD space, so that the HD data samples closing to the true data this time can be generated efficiently through searching in the nearby area of this position.For these advantages, the proposal algorithm (LDSIL) can disambiguate silhouettes, overcome the transient occlusion, and reduce estimation error.Furthermore, the performance can be promoted through its incremental learning which is used to update the mapping from the LD space to the HD space.
As to the estimation of Bayesian theory above, we use SEMAS and the proposed method of searching to find the optimal prior HD data y pri , which can make (  y   ,   )/(  y   | y) = (  |  y   ) large enough.When  y   = y pri , according to (34), somehow, (  y   ,   ) also become large enough.Then, it means that y pri can be seen as the sample generating from the unimodal distribution density (  y   |   ), due to (  y   ,   ) ∝ (  y   |   ) = (  y   ,   )/(  ), and (  ) is known.Moreover, the true HD data can be seen as the mean relative to the distribution density (  y   |   ), and thus its value of distribution density will be larger than others.On the whole, it also means that y pri is the sample which is close to the true HD data; thus taking y pri as the mean (y = y pri ) with the small scale variance to search the HD data in the HD space for only once can generate the HD data samples which will also have the large value of distribution density (  y   |   ); then the mean of these samples can be close to mean relative to distribution density (  y   |   ); in other words, it will be close to the true HD data.The work above can avoid the small scale searching (sampling) the HD data for many times or the large scale searching (sampling) the HD data directly and generating the invalid HD data to estimate the human motion in 3D.
Then, we will discuss more details about the whole procedure of human motion estimation in the following section.

The Procedure of Algorithm for Human Motion Estimation.
The work from Sections 2-7 can be summarized as the complete algorithm procedure of LDSIL to estimate human motion in 3D.The framework of LDSIL can be seen as shown in Figure 6.Let us give more detailed description for the framework.
From Figure 6, we can see the framework of LDSIL.Firstly, LDSIL is using the LDSS to estimate the human motion in 3D based on the small amount of HD data samples and get the estimated HD data covering about one motion cycle.The framework of LDSS consists of SEMAS, linear and nonlinear searching in LD space, IPDRM, and Bayesian theory.The estimated HD data with the spatial position information removed can be selected as the new HD data samples and used for updating the mapping from LD space to HD space through IPDRM.Secondly, the IPDRM is used to obtain the corresponding LD data of the new HD data samples through its mapping g of incremental dimension reduction.Selecting the new HD data samples can be achieved through comparing the distance among the corresponding LD data.After the comparison is finished, the LD data and the corresponding HD data can be obtained and used to update the mapping from LD space to HD space for the incremental learning of LD space.Thirdly, the new mapping f new updated by the selected HD data and the corresponding LD data can generate the HD data  more accurately; thus the estimation can achieve a better performance with the help of new mapping f new .Finally, the estimation of human motion in 3D from the multiview silhouettes is carried out through the framework including SEMAS, linear and nonlinear searching in LD space, IPDRM with new mapping f new , and Bayesian theory.However, when LDSIL is used to estimate the human motion, these data are known as follows.
We have the sequence of the small amount of HD data samples Ỹ (no spatial position information) covering one motion cycle and the HD data of initial frame (1st frame) containing the spatial position information (6 dimensions in total)  y * 0 =  y 1 . can denote the number of estimatedframes without incremental learning, which are used for the sample selection of incremental learning, and  * can denote the number of estimated frames with incremental learning.Then, we begin the procedure as follows.

LDSIL for Human Motion Estimation
(1) Use ( 9) to reduce the dimension of Ỹ, obtain X, build the mapping f, and let  y * 0 =  y 1 ,  = 1.(2) Use Algorithm 1 to training equation (11), build the mapping g, get IPDRM, and let mapping f  = f.

Experiments and Evaluations
We tested the performance of our proposed algorithm (LDSIL) in three views (V = 3) by comparison with other Mathematical Problems in Engineering 13 three traditional algorithms, which included annealed particle filter [4] (APF), Gaussian particle filter [21] (GPF), and particle filter [22] (PF).The tested performance included disambiguating silhouettes, overcoming the transient occlusion, and estimation error.Besides, we compared LDSIL with LDSS to test the validity of incremental learning independently, and the LDSS was similar to LDSIL somehow, so it had the performance of LDSIL, but it was without the incremental learning, and the mapping from LD space to HD space was not updated, which still used (10) to generate HD data.The performance of LDSIL and LDSS could be seen by the comparison of the mean error and maximum error.
Firstly, we were testing disambiguating silhouettes and taking walking and running motion as the test cases.Then, we saw in the subfigures of Figure 7, after estimating several frames, APF, GPF, and PF could not disambiguate silhouettes to estimate the human motion in 3D; on the contrary, LDSIL could disambiguate silhouettes to estimate the one that was close to the true data.The ambiguity meant a set of silhouettes such that the pictures of Figure 1(c) could not distinguish positions of the limbs from the human body; for example, the silhouettes could not depict which one of the legs was in front or in back.Thus, we could see in the subfigures of Figures 7(a) and 7(b) that the estimated human motion projections in all views from APF, GPF, and PF showed that the positions of left leg (white line) and right leg (gray line) were, respectively, opposite by the comparison of true data.The corresponding human motion models were reflecting the same phenomenon, and the left leg (gray one) and right leg (black one) of the model were also opposite.
Secondly, we tested overcoming transient occlusion.The transient occlusion was equal to estimate the adjacent and discontinuous frame, due to transient occlusion of cameras from all the views, which would lead to the result that the silhouettes from all the views could not be obtained.The two frames (initial data and true data) between which the interval of 10 frames existed were used to test all the algorithms.We could find in the subfigures of Figures 8(a) and 8(b) that APF, GPF, and PF could not estimate the human motion models accurately, which showed that the limbs from the human body mismatched the images of all the views.However, LDSIL could estimate the models more accurate than APF, GPF, and PF, which were depicted in the corresponding subfigures, and the limbs from the human body matched the images of all the views better like the true data.
Thirdly, we used 50 continuous frames to test the estimation error of each algorithm.We used walking and running motion sequences (walking 1-3, running 1-2) to test, whose spatial positions had obvious change.The subfigures of Figure 9 showed the experiment results, the mean error and standard deviation from LDSIL were the smallest among these algorithms, and the maximum error shown in the identifications of the corresponding subfigures was also the smallest; besides, the errors of most frames from LDSIL were smallest.The error of each frame could be computed as below [23]: where Bx ma and Bx  ma were, respectively, true position and estimated position of the joint marker in the model and Mar was the number of the markers.From Figures 9(b)-9(f), we found that the errors of all the tested algorithms were close in the estimation of the first 20 frames, but the error from LDSIL could be kept smaller than APF, GPF, and PF after the 20th frame.However, it was reasonable that the errors from APF, GPF, and PF might be close to LDSIL in some frame estimation, because initial data was close to the true data and the probability was stochastic, which were still larger than the LDSIL on the whole.
Finally, we tested the validity of incremental learning of LDSIL.Compared with LDSIL, LDSS had no incremental learning.We selected 3 motion sequences to test the estimation error for 6 times arbitrarily, which included known and unknown spatial position information.We were setting the same parameters in the experiments, and the performance of LDSIL was better than LDSS as shown in the Table 1, according to the comparison of the mean errors and maximum error.The experiment results indicated that the incremental learning played the key role, because LDSIL had the smaller mean error and maximum error in the tests.

Conclusions
We could see the experiments above, which indicated that the proposed algorithm (LDSIL) could contribute a lot to estimate the human motion containing the spatial position information in 3D from the multiview silhouettes.The results showed that LDSIL had the best performance including disambiguating silhouettes, overcoming the transient occlusion, and reducing estimation error by the comparison of the other three algorithms (APF, GPF, and PF).Meanwhile, the feasibility and performance of incremental learning in the LDSIL were also validated by the experiments.In addition, the segmentation of multiview images must achieve the high quality; otherwise the results in the experiments would be affected.Our work in this paper had some limitation as follows.Firstly, the initialized frame of the human motion model (the 1st frame) was known, which contained the spatial position information; secondly, the spatial position information and pose of the estimated human motion in 3D were changing regularly, and the human motion model had markers.In the future, these limitations would be improved.Our work would solve the problem that initialized frame needed to be known and focus on complex human motion estimation.Moreover, the human motion model would have no marker [24,25].

Figure 2 :
Figure 2: The comparison of traditional Monte Carlo method and SEMAS.

Figure 3 :
Figure 3: Comparison of searching in the HD space and LD space among neighbour data samples.

Figure 4 :
Figure 4: The low dimensional data through increment learning.

Figure 5 :
Figure 5: The method of searching.