Effective Inertial Hand Gesture Recognition Using Particle Filtering Based Trajectory Matching

Hand gesture recognition has becomemore andmore popular in applications like intelligent sensing, robot control, smart guidance, and so on. In this paper, an inertial sensor based hand gesture recognition method is proposed. The proposed method obtains the trajectory of the hand by using a position estimator. The proposed method utilizes the attitude estimation to produce velocity and position estimation. A particle filter (PF) is employed to estimate the attitude quaternion from gyroscope, accelerometer, and magnetometer sensors. The improvement is based on the resampling method making the original filter much faster to converge. After smoothing, the trajectory is then converted to low-definition images which are further sent to a backpropagation neural network (BP-NN) based recognizer formatching. Experiments on real-world hardware are carried out to show the effectiveness and uniqueness of the proposed method. Compared with representative methods using accelerometer or vision sensors, the proposed method is proved to be fast, reliable, and accurate.


Introduction
With the development of mobile platforms, applications on hand gesture recognition have become more and more popular.For instance, Google Inc. released its Google Project Glass in 2012 which utilizes a very simple hand gesture recognition system.Basically, the hand gesture recognition systems can be classified into two groups: the vision-based and the inertia sensor based [1].
As has been proposed in [2,3], the vision-based hand gesture recognition system tracks user's hand's feature, analyzes user's motion, and outputs the final interpreted command.The most widely used recognition methods are artificialintelligence-based methods [4][5][6], for example, the neural network, supported vector machine, and hidden Markov model.This kind of recognition system is relatively reliable but it costs too much time consumption on the modeling and matching.According to stereo vision theory [7], at least one camera is needed to accurately estimate the three-dimensional attitude and velocity of a certain object.However, such configuration of the system will enlarge the economic cost of the system significantly, including the core processors and the data acquisition system.Also the time complexity of data processing and computation are high, which limits the development of applications on mobile platforms.
Other recognition methods are inertial sensor based [8].These methods usually use low cost MEMS accelerometers to detect the motion of human hand [9][10][11][12][13].Gyroscopes can also be used to measure the attitude of the user's hand in order to analyze the gesture more accurately [14,15].Combining the gyroscopes and accelerometers together the real-time velocity can be determined which is helpful for hand gesture recognition [16,17].However, the accuracy of the method is to be improved since by using only one sensor output the observability of the gesture may not be satisfactory.
The above methods are developed due to the advances in computer vision and integrated sensor fusion.In vision-based hand gesture recognition, the feature extraction, and learning are always important.There are many related techniques like the Scale-Invariant Feature Transform (SIFT [18]), Gradient Location and Orientation Histogram (GLOH [19]), Convolutional Neural Network (CNN [20]), and so on.Extracting motion from a picture requires the analysis of the correspondence between the current frame and last frame, for example, the RANdom Sample Consensus (RANSAC [21]).However, these techniques are time-costly.In fact, by using inertial sensors, the object's motion can be computed with much simpler frameworks.These years the attitude and position estimation based on MEMS sensors have been extensively studied [22], which provides us with new ways of obtaining hand's motion [23].
The main purpose of this research is to find a way that adopts inertial sensors only to determine the correct human hands gesture.According to existing papers published between 2012 and 2016, many of them use learning techniques to model the gesture via captured sensor data at one certain moment.At this point, it is not so reliable and accurate because a gesture can only be well determined if the history data is also included for analysis.To implement this, the trajectory determination is vital.Therefore, this paper first computes the trajectory of human hand by inertial integration of attitude, velocity, and position.What has to be pointed out is that the attitude estimation significantly influences the results of velocity and position; hence a PF is introduced for accurate and convergent estimates of attitude quaternion.In this paper, we mainly have the following contributions: (1) We use inertial sensors including gyroscopes and accelerometers to estimate the position of the user's hand.The first step is to fuse these sensor outputs into the attitude quaternion using the PF.The velocity and position update is then performed using the ZUPTaided equations.(2) After recording the trajectory of the hand, the trace is saved as a low-definition image which will be further sent into a BP-NN based gesture recognition system proposed in [24].(3) The proposed method is systematically verified by experiments showing its advantages in estimation accuracy and feasibility of implementation.It is also compared to a representative gesture recognition algorithm based on accelerometer, which gives proof of its superiority on the success rate.
Figure 1 shows the structure of the recognition system of our paper.This papers is briefly structured as follows: Section 2 introduces the trajectory estimation method.Section 3 includes the BP-NN based hand gesture recognition method.Experiment, simulations, and results are given in Section 4. Section 5 contains the concluding remarks.fact rotation vector is the eigenvalue of the direction cosine matrix (DCM) and can be given by [25] (C − I) Φ = 0,

Relative Position Estimation
where C is the DCM, I is the identity matrix, and Φ is the rotation vector which can be defined by where Φ  is the projection of Φ on -axis, Φ  is the projection of Φ on -axis, and Φ  is the projection of Φ on -axis.The differential equation of the rotation vector can be given by [25] where  is the real-time angular velocity which can be given by  = (  ,   ,   )  .Generally speaking, (3) can be simplified as By integrating (4), the real-time rotation vector can be calculated.The most important characteristic of the rotation vector is that the rotation vector can compensate for noncommutative error effectively.In 1971, Bortz proposed an attitude estimation method that used rotation vector to avoid coning error [26].Twelve years later, Miller proposed a discrete-rotation-vector based method with three samples in one integration period [27].Savage summarized several compensation methods in [28,29].According to Savage's theory, the three-sample rotation vector method supposed that the angular velocity can be given by a two-order polynomial within one updating period: where a, b, c can be given by [25] Expanding ( 4) with Taylor Series, we have Suppose Φ() = 0, multiorder derivatives of the rotation vector can be given by Inserting ( 8) into (7), the discrete form of ( 4) can be obtained [28]: where Δ = Δ 1 + Δ 2 + Δ 3 is the vector sum of angular increments.Equation ( 9) is called the three-sample rotation vector algorithm.Once the rotation vector is calculated, it can be converted to a quaternion: where  = √Φ 2  + Φ 2  + Φ 2  denotes the length of the rotation vector.Multiplying the quaternions from the initial time, the DCM from hand to geographical coordinate system can be given by [28] C   = ( where  represents hand while  represents North-East-Down (NED) Coordinate System.The DCM forms the mathematical platform of the Strapdown Inertial Navigation System (SINS).
The MEMS gyroscope usually has bias in the output which leads to the drift of the attitude's integral.In this way, accelerometer is usually used for compensating for the drift.In engineering practice, observers and optimal filtering techniques are usually adopted to improve the accuracy of the attitude estimation [23].In the next subsection, we would like to introduce one Improved Unscented Particle Filter for such state estimation.

Particle
Filtering for Attitude Estimation.The inertial attitude determination can be acquired from equations in last subsection.However, in real practice, other sensors like accelerometer and magnetometer are integrated with gyroscope for much more accurate and stable results.We now introduce the particle filtering algorithm which was proposed in [30].Assume that the discrete state space has the following model: where X  , L  represent the state vector and observation vector, respectively.W  , V  are independent zero-mean white Gaussian noises (WGNs).Then the particle filtering has the following calculation procedure: Journal of Electrical and Computer Engineering (1) Initialization: at  = 0, we extract  sample point  () 0 and weight  () 0 = 1/ from the importance function, where  = 1, 2, . . ., .Using where  denotes the importance function and  is the probability density function, we may compute the particle's weight.It is also noted that here the former  denotes the likelihood function while the latter is the state transition one.( 2) Forecast: at time epoch , we forecast  particles using the state model where the process noise is subjected to the probability density function (W  ).
(3) Update: the importance of  particles is updated using ) and then normalization and then using norm 1. (4) Resampling: given the threshold number of the particles as  th , we can calculate the effective number of particles by When we get  eff <  th , we may resample the  particles and obtain their weights.Then using X = (1/) ∑  =1  ()  X ()  the final estimated state can be computed.In this paper, the resampling technique is chosen as the residual resampling.
The presented scheme is a sequential Monte Carlo suboptimal method.In attitude estimation, the quaternion can be used as the state vector.The accelerometer-magnetometer combination can be used for measurement model.The measurement equation, that is, can show that the direction cosine matrix is the quadratic function of the attitude quaternion.Hence there is nonlinearities inside the measurement model and it is very suitable to use particle filtering for state estimation.
With the above algorithm, we may improve the conventional nonlinear estimation results that mainly generated from Extended Kalman Filter (EKF [31]).Here we define the state variable as X = ( 0 ,  1 ,  2 ,  3 )  .The state propagation model is given in the last subsection and the variance information has been systematically derived.Related materials can be found in [32].Then, with the presented approach, the filtering can be recursively continued.

Velocity and Position Determination.
The differential equation of velocity can be given by [28] where k is the velocity of hand, f  is the acceleration measured by the accelerometer,    is the rotational angular velocity of the Earth in NED,    is the angular velocity of NED relative to the Earth-Centered, Earth-Fixed (ECEF) Coordinate system and g  is the vector of gravity which can be written as where  is the local gravitational acceleration.Usually, MEMS gyroscopes cannot sense the rotational angular velocity of the Earth, so that    is much smaller than    in most low-speed cases.So ( 13) can be simplified in this paper as follows: In accordance with (15), the discrete form of the velocity update equation can be given by [29] k where C −1 is the DCM at the moment of  and Δ is the angular increment which is a function of time.Let and ( 16) can be written as Obviously, the most significant item of ( 17) is C −1 Δv () which can be derived as follows: ] be the sculling motion item.According to [29], the three-samplebased sculling motion item can be given by With ( 16), ( 17), ( 18), (19), and (20), the real-time velocity can be calculated.The position of the hand in NED can be also calculated by integrating the velocity which can be given by Here, we suppose that the initial position vector is p 0  = 0 and the real-time position of the hand will be calculated and recorded.The recorded trace will be projected to the vertical plane and the projected trace will be saved as a 64 × 32 image which can be recognized later using BP-NN.
In real applications, the velocity and position estimation may diverge due to the bias of the accelerometer.In this way, the zero-velocity update (ZUPT) is introduced to overcome such disadvantages [33].That is to say, when the acceleration is measured to be less than a settled threshold (usually the absolute value of the accelerometer's bias), the acceleration would not be integrated, so as to the position.

The Structure of BP-NN.
The backpropagation neural network is a Multilayer Neural Network (MNN) which owns at least three layers and each layer of the BP-NN consists of several neurons.Each neuron in the hidden layer is connected with all the neurons in both the front layer and the rear layer while there is no connection between neurons within a layer.When training the BP-NN, the activation values of the neurons will spread from the input layer to the output layer.To lower the error between expected output values and the feedback values, the backpropagation algorithm (BPA) will be used to adjust the weights between the neurons from the output layer to the input layer.Just like the feedback control in Modern Control Theory, the error will then be decreased time after time.When the error is less than a predetermined threshold, the training process stops and the trained BP-NN can be used for gesture recognition.In this paper, the threelayer BP-NN is used for recognition.A three-layer BP-NN can be illustrated in Figure 2.

The Training Algorithm of BP-NN.
According to neural networks theory, the basic training methods can be given as follows.
The training process of BP-NN is actually an optimization problem.The training algorithm based on Gradient Descent Method (GDM) can be given by where  is the training efficiency,   denotes the real output of the th neuron, and   is the error of the th neuron.  can be given by where   is the ideal output of the th neuron,   is the real output of the th neuron, and   denotes the th neuron.

Grid-Based Feature Extraction.
As has been utilized in [34], the grid statistical feature extraction method is very popular.In this paper, the image is divided into 16 × 8 = 128 grids.We use digits from 0 to 9 as the ideal hand gesture.The divided numbers 0 and 8 can be given in Figure 3.

Experiment
4.1.Platform Setup.The proposed algorithm fuses the inertial sensor data into attitude quaternion and then computes the velocity and position.Here the position's history can well describe the trajectory of the human hand.In this case, we especially design one experimental platform for such validation.In Figure 4, the tower development platform with NXP Kinetis MK60DN512 microcontroller is presented.The employed platform owns the core processing speed of 100 MHz along with the interfaces of SDIO, SPI, UART, WIFI, and CAN Bus.This allows for the data acquisition and logging of wearable inertial sensors.A miniature inertial sensor module including MPU6000 gyroscope, accelerometer combination, and HMC5983 magnetometer is attached to the designed platform using the RS232 cable free of electromagnetic interference (see Figure 5).The module is in the size of 3 cm × 3 cm, making it much more flexible to be mounted on human's hand.The data polling rate is set to 500 Hz for MPU6000 and 220 Hz for HMC5983.Apart from this, the miniature module can also produce the reference attitude outputs in quaternion.According to the reference manual of this product, the attitude estimation algorithms is effective and accomplished by EKF.In the later comparisons, the results are going to be compared with such reference Euler angles.
4.2.Attitude, Velocity, and Position Estimation.The initial attitude is set as X 0 = q = (1, 0, 0, 0)  .The initial variance of the state vector is defined as Σ W 0 = 0.001I 4×4 .In the particle filtering design, we set  = 50 while the threshold is  th = .The raw sensor data is shown in Figure 6 while    the generated attitude outputs are shown in Figure 7.We can see that the proposed attitude estimator has basically the same performance with reference system.
The calculated attitude is then used for velocity and position integral.After the position integral, the trajectory are saved as images.The first 30 trajectories are used as training samples.And the printing character is also added to the set of training samples which can be shown in Line 4 of Figure 8.   activation functions, which are shown in Figures 9 and 10 and Tables 1 and 2.
As can be seen in Figures 9 and 10, the performance of LMM is better than GDM in this case.MSEs from different combinations are grouped into Table 1.
MSEs of the test data can be given as in Table 2. Obviously, GDM is not reliable in this paper at all because its success rate is too low for real applications.The STF-LMM combination shows the best performance among all the combinations.The STF-LMM combination can be used for recognition, which is a component of the proposed system.a recent representative method.This method is proposed by Xu et al. in which an accelerometer is adopted for gesture recognition [13].The advantage of the proposed method is that it logs the history gesture movements so that the hand gesture can be determined more accurately.We first generate several gestures with the designed platform and then use the aforementioned parameters and modeled BP-NN to verify the success rate of both methodologies.The general results are summarized in Table 3.
We can see that, for some instant gestures, the two methods show not much macroscopic differences.However, when the gesture becomes slow, which relies on the history identification of itself, the proposed method shows much more superiority.This verifies the parameters and models described above and also proves the feasibility and efficiency of the proposed algorithm.

Conclusion
In this paper, we propose a hand gesture recognition system that combines inertial sensors and BP-NN together.Rotation
(a) Trace of zero in grid (b) Trace of eight in grid

Figure 4 :
Figure 4: Designed hardware for data acquisition from hand gesture sensor.

4. 3 .
Neural Network.The Neural Network Toolbox of MAT-LAB is utilized for training the BP-NN.In this section, the Levenberg-Marquardt Method (LMM) and GDM are adopted as training algorithm, respectively.The Mean Square Error (MSE) was used as the evaluation standard of the performance of trained BP-NN.The performances of the BP-NNs are compared with different training algorithms and

4. 4 .
Gesture Recognition Results.With the designed system shown above, we make several comparison experiments with