A Fusion Recognition Method Based on Multifeature Hidden Markov Model for Dynamic Hand Gesture

In this paper, a fusion method based on multiple features and hidden Markov model (HMM) is proposed for recognizing dynamic hand gestures corresponding to an operator's instructions in robot teleoperation. In the first place, a valid dynamic hand gesture from continuously obtained data according to the velocity of the moving hand needs to be separated. Secondly, a feature set is introduced for dynamic hand gesture expression, which includes four sorts of features: palm posture, bending angle, the opening angle of the fingers, and gesture trajectory. Finally, HMM classifiers based on these features are built, and a weighted calculation model fusing the probabilities of four sorts of features is presented. The proposed method is evaluated by recognizing dynamic hand gestures acquired by leap motion (LM), and it reaches recognition rates of about 90.63% for LM-Gesture3D dataset created by the paper and 93.3% for Letter-gesture dataset, respectively.


Introduction
Dynamic hand gesture recognition is a very intriguing problem in recent years that, if efficiently solved, could be the wealthiest means of communication that can be used. Because of this, many scholars from all over the world have done a lot of theoretical and practical research studies [1]. Compared with static gestures, the meaning of dynamic gestures is more abundant, and it is more common and natural to be an interactive way. But, at the same time, the information of dynamic hand gestures, such as shape and location, varies as time, which consequently increases the difficulty in recognition.
At present, there are two main types of sensors that are capable of sensing hand gestures: wearable sensor or visionbased sensor [2,3]. e former approach could capture the movement of hands and fingers at the expense of convenience and cost and sufficiently extract information of hand, but it places an additional burden on users and could feel unnatural enough to perform hand gestures. Some advantages of a vision-based sensor are it can be less cumbersome and has more natural interaction than the wearable sensor due to no physical contact with users. However, its computational complexity is quite high for hand detecting, tracking, and extracting [4]. For instance, a hand should be separated from the background before the final recognition, which can be significantly affected by external environmental factors like ambient light. On the contrary, due to the complex 3D movements of hands or fingers, it is difficult to properly understand the performed hand pose based on the extracted information from 2D images [5]. Besides, once the palm surface is not parallel to the camera, for example, the recognition work could be harder. e classification is a crucial step to recognize hand gestures. Five main classifying methods of hand gesture based on 3D vision can be identified: support vector machines (SVMs), artificial neural network (ANN), template matching (TM), HMM, and dynamic time warping (DTW) [4]. e SVM is a popular classifier for hand gesture recognition, in which support vectors are used to determine the hyperplane to realize the maximum separation of the hand gesture classes [6]. In vision-based hand gesture recognition systems, the ANN is used as a classifier to handle only fundamental and limited hand gestures [7]. When the highlevel discriminative 3D hand features are available, the TM is an excellent choice for recognizing hand gestures, which works quite well with the contour-or boundary-based hand features [8]. As the hand gesture is a continuous pattern concerning time, the HMM is found to be the most suitable pattern recognition tool for testing on a moderately large dataset [9]. DTW is an indirect continuous hand gesture recognition approach that automatically aligns the sequences with different lengths and returns the proper distance [10].
Martin Sagayam and Jude Hemanth [11] develop a probabilistic model based on the state sequence analysis in the HMM to recognize hand gestures taken from the Cambridge hand dataset. e experimental results show that the proposed method achieves a 0.98% reduction in error rate and a 1.55% improvement in the recognition rate over that of the Viterbi prediction. Some work combines HMM with other methods for gesture recognition. Zhou et al. [12] use HMM to model the different information sequences of dynamic hand gestures and use BP neural network (BPNN) as a classifier to process the resulting hand gestures modeled by HMM, which achieves a satisfactory real-time performance and an accuracy above 84%. Martin Sagayam and Jude Hemanth [13] propose a hybrid 1D HMM model with artificial bee colony (ABC) optimization.
e method is carried out with nine different classes of hand gestures that are used for virtual reality applications. e experimental results show that the average value of the recognition rate with ABC optimization increases by 2.72%, and the average value of the error rate is decreased by 0.47%.
With the emergence and development of deep learning technology, some scholars try to apply the technology for hand gesture recognition. Oyedotun and Khashman [14] apply a convolutional neural network (CNN) and stacked denoising autoencoder (SDAE) to recognize 24 American Sign Language (ASL) hand gestures obtained from a public database, which achieves the recognition rates of 91.33 and 92.83%. Bao et al. [15] propose a deep CNN that can classify hand gestures from the whole image without any segmentation or detection stage information. e method can organize seven sorts of hand gestures in a user-independent manner and achieve an accuracy of 97.1% in the dataset with simple backgrounds and 85.3% in the dataset with complex backgrounds.
In recent years, 3D sensors, such as binocular cameras, Kinect, and LM, have been applied for hand gesture recognition with excellent performance. LM can detect and track hands and fingers with an accuracy of about 0.01 mm and feedback the gesture information in real time with a sampling rate of 120 fps [16]. Because of its superior performance, many researchers consider that it is a promising 3D sensor and particularly suitable for hand gesture recognition. For instance, Chen et al. [17] extract directional codes of 3D motion trajectory as the feature and exploit a classifier based on SVM to classify letter and number gestures. Ameur et al. [18] extract the positions of fingertips and palm center as features that are then trained with an SVM classifier. eir method reaches an average recognition rate of about 81% with 11 kinds of dynamic gestures. Xu et al. [19] and Zeng et al. [20] also conducted similar studies. Besides t, some researchers are working on dynamic gesture recognition. Lu et al. [21] build two kinds of features and feed them into the hidden conditional neural field classifier to recognize dynamic gestures. Avola et al. [22] propose a long short-term memory (LSTM) and recurrent neural networks (RNNs) combined with an effective set of discriminative features based on both joint angles and fingertip positions to recognize sign language and semaphoric hand gestures, which achieves an accuracy of over 96%. Vamsikrishna et al. [9] propose a low-cost computer-visionassisted setup based on LM to detect precise movements of palm or finger within the field of view of the sensors. en, it presents a set of discrete HMM for classifying the gesture sequences performed during rehabilitation. e paper is aimed at recognizing the hand gestures corresponding to an operator's hand commands in robot teleoperation. For the problem, the paper develops four feature vectors and their extraction models based on 3D information acquired by LM to describe the hand gestures. And then, the article establishes HMMs to calculate the occurrence probabilities of four feature sequences in an unknown hand gesture, respectively. Lastly, the paper uses a weighted algorithm to fuse the occurrence probabilities of four features. e most considerable hazard is taken as can be taken as a recognition result. e rest of the paper is organized as follows. Prophase works of hand gesture recognition are introduced in Section 2. e methods of feature extraction are presented in Section 3, including valid dynamic gesture judgment, feature definition, and feature sequence clustering. HMM training model and hand gesture recognition by fusing the feature probabilities are proposed in Section 4. Section 5 comprises experiments and the result and discussion. Conclusion and possible future extensions are given in Section 6.

Leap Motion and Data
Acquisition. LM, based on timeof-flight technology, mainly consists of three infrared LEDs and two infrared cameras, which can take photos from different directions to obtain gesture information in 3D space [16]. LM has about 150 degrees view field and an effective range of approximately 0.03 to 0.06 meters above itself. LM could feedback data frames that consist of positions and velocities of key points, rotation information, and frame timestamp.
When collecting gestures, LM will establish a right-hand coordinate system, as shown in Figure 1, based on all obtained data such as position, speed, and gesture of human hands. As shown in Figure 1, the five fingertips are denoted by f i (i � 1, . . . , 5), and palm center is denoted by C. We mainly focus on the following data: (1) palm normal vector n → and palm direction vector h → , which represent unit vectors perpendicular to the palm plane and point from the palm position toward the fingers, respectively; (2) finger direction vector f → i and the finger extension length points d i , which represent the unit vector pointing to the point of the finger point F i and the distance between two points, respectively; (3) instantaneous velocity v i of five fingertips and instantaneous velocity v C of the palm center; and (4) coordinate p t (x t , y t , z t ), which represents the coordinate of the palm position in the frame t.

2
Computational Intelligence and Neuroscience

Dynamic Gesture Definition.
ere are relatively few publicly available hand gesture datasets created by LMsampled images, especially for dynamic hand gestures in robot teleoperation. We analyze the movement characteristics of the operator's hand command in the robot teleoperation, such as translation and rotation of three degrees of freedom, and create a gesture dataset named LM-Ges-ture3D, which contains eight different dynamic gestures, as shown in Table 1. All these gestures collected by LM represent some practical operations or command signs and can be performed easily and naturally. Besides, there are similarities among the gestures in some respects, which will be illuminated in more detail later.

Feature Extraction
3.1. Valid Dynamic Gesture Judgment. Despite the fact that LM has many merits, it mainly acts as a gesture data collector similar to a wearable device and camera. Hence, conditions for judging the beginning and the end of a valid dynamic gesture need to be given first. Take LM-Gesture3D as an example; it can be seen that the fingertips and palm center will inevitably produce rapid and continuous displacement when either gesture is performed. Even for a simplest dynamic gesture, click, for example, is no exception. A simple discriminant, based on the above analysis, is established as follows: where v C and v i are the instantaneous velocity of palm center and fingertips, respectively, and v τ is the predefined velocity threshold. When the total number of continuous frames up to 60, v C and v i , satisfy discriminant (1), the data frames will be regarded as the original data of a valid dynamic gesture.
As LM is quite sensitive, in both cases when hand makes a slight shaking at rest and the obtained data contain noise, discriminant (1) could be satisfied in a few consecutive frames. So the total number (i.e., 60 frames) is set to eliminate these useless data. In addition, dynamic gesture with a low speed will be judged as invalid by discriminant (1), which means there is a degree of freedom for hand movement. e former can be further divided into the bending angle of fingers, opening perspective between fingers, and palm posture. e gesture trajectory can be represented later. erefore, the paper describes the changes in gestures through the above four features. e specific extraction process and expression of the four features are as follows.

Palm Attitude
Feature. If the palm shape changes little in a dynamic gesture, the change in palm posture can be regarded as the problem of attitude angle calculation of a rigid body. e paper draws lessons from the 3D attitude measurement method, which is pointed out in [23].
As shown in Figure 1, the palm posture in the 3D space at any time could be uniquely determined by palm normal vector n → and palm direction vector can be obtained to represent the palm posture in frame t. We take the initial data frame of the dynamic gesture as the fixed coordinate system and denote it as So, the change in palm posture between the current frame and the first frame can be represented with three Euler angles: Figure 1: Data acquisition from leap motion.

Bending Angle of Fingers.
As we mainly focus on the bending angle of the finger, the thickness of the finger could be regarded as useless, and then, each finger can be simplified to a planar model, as shown in Figure 2. Based on the two models, Hong et al. [24] propose a method to estimate the hand's attitude or instead bending angle of fingers and coordinates of joint points. At all conditions, their method require a merely total length of the finger l i , visible length of the finger d i , and several constraint constants. Combining with their research, we define the finger bending angle as where d i can be obtained directly from LM and l i equals to d i when the finger is straight. In equation (3), l i is used for normalization in order to make the approach robust to people with hands of different sizes. For l i , a simple method is proposed to calibrate before data acquisition. e user keeps his/her palm plane parallel to LM and open fingers as straight as possible. When data of total continuous frames satisfy (1) n y ⟵ 0.94 and (2) . . , 5, up to 30, the obtained visible lengths of five fingers could be recorded as total lengths, where n y is the component of normal vector n → along the Yaxis direction in the LM coordinate system.

Opening Angle of Fingers.
e other descriptor for the fingers is the opening angle between fingers. As mentioned above, every single finger can be modeled on a plane. us, the problem of computing the angle between two fingers can convert to one calculating the angle between two planes. Here, the plane consists of h → , and n → is taken as the benchmark plane in the computation. Let h → × n → and f → i × n → be the normal vector of the benchmark plane and finger planes, respectively. So, the opening angle can be calculated as follows:

Trajectory Feature.
A specific and meaningful trajectory usually accompanies some dynamic gestures, such as circling with a finger (like G5). So, the paper considers the path of the dynamic gesture and extracts a simplified feature for gesture recognition. When LM works, it can detect the palm center's return space coordinates with high accuracy and stability. So the moving trajectory of a hand can be expressed by a series of discrete points. e paper projects the gesture trajectory onto the LM's principal gesture plane, i.e., the XOZ plane. e detailed feature extraction processes are as follows: (1) Let (x 1 , z 1 ), . . . , (x T , z T ) be the discrete points of the 2D gesture trajectory, then the central point of these points can be expressed as follows: (3) Norm of the vectors d t is normalized with the maximum norm d max , thus obtaining δ t . Besides, direction angles of the vectors φ t are converted into codes ψ t according to the angular regions, as shown in Figure 3. δ t and ψ t can be computed as follows: Before coding the direction angle φ t , we change the coordinate system from the original LM one into the coordinate system, as shown in Figure 3(a), the z ′ axis of which always points from the central point p 0 to the first point p 1 . e obtained trajectory feature δ t and ψ t are of scale and rotation invariance based on the operation plane.
Select typical data once for each gesture in the LM-Gesture3D dataset and build their feature diagrams, as shown in Figure 4. Each row in Figure 4 corresponds sequentially to one of the gestures in the LM-Gesture3D. Four descriptions in each row from left to right are palm posture, finger bending angle, finger opening angle, and trajectory, respectively. It is not hard to see that each feature diagram depicts how its corresponding gesture is performed nicely. Gesture with complicated changes usually corresponds to complex feature curves, and vice versa. Different gestures may have similar features. e palm posture feature of G1-G3, for example, is similar to that of G6-G8 finger bending angle, and finger opening angle of G6-G8 is similar to each other. erefore, it is not easy to distinguish these gestures just with a single feature. Of course, there are some gestures with significantly different features like G1 and G2. So, there is no misrecognition between G1 and G2.
ere may be some more distinguishing features that can improve the recognition rate as well as reduce the computation cost for a given gesture. However, considering eight kinds of gestures in LM-Gesture3D that have obvious similarities, we prefer to select a feature set with completeness and redundancy that meets the requirements of unified modeling and recognizes the gestures. According to the description of Figure 4, some features in the defined feature set are similar to each other for different gestures. However, there are some distinct features that are also included in the defined feature set. So, on the whole, the collected LM-Gesture3D or other more kinds of dynamic gestures can be adequately represented and distinguished by the defined four types of features.
In all four features, finger bending angle and finger opening angle are not affected by acquisition direction. To verify whether the rest two kinds of features are rotation invariance, we obtain the hand data of the gesture G6 from an experimenter, who is asked to make the gesture G6 twice during the collecting period. en, we extract the posture feature and trajectory feature from the collected hand data and draw the feature curves, as shown in Figure 5. Table 2, in a single data frame of a gesture, four features can be represented by m i (i � 1, 2, 3, 4) dimensional vectors, respectively. Accordingly, each feature in T data frames of a dynamic gesture forms T × m i dimensional vector sequences. In order to build the model of discrete HMM, K-means algorithm [25] is used to cluster the feature vector in the sequence. After clustering a feature vector into q class, the feature vector sequence can be expressed as O � o 1 , . . . , o t , . . . , o T , where o t � 1, . . . , q indicates that the feature vector is closest to the cluster center numbered o t . In the paper, the cluster number q of four kinds of features is shown in Table 2.

Feature Sequence Clustering. As shown in
In short, we take the discrete feature sequence composed of cluster tags as inputs of the discrete HMM. erefore, both the sample data for HMM training stage and the gesture data for HMM recognizing need to go through the steps of feature extraction and clustering.

Recognizing Flow.
e recognizing process of gesture is shown in Figure 6, which can be divided into two parts. e first part deals with the accurate gesture segmentation and four features extraction and quantification. e second part includes HMM model training and gesture recognition, both of which are based on the premise of feature sequences extraction.
e formal features of HMM can be expressed with a 5tuple (Ω X , Ω O , A, B, π), where Ω X � q 1 , . . . , q N is a finite set Markov chain state, and N is the number of states; Ω O � V 1 , . . . , V M is a finite set of observation symbols, and M is the number of symbols. A � (a ij ) N×N is the matrix of state transition probability, B � (b ij ) N×M is the matrix of observation probability, and π � (π 1 , . . . , π N ) is the initial state probability distribution.

HMM Training.
Unlike common one HMM for one kind of gesture modeling pattern, we build one HMM model for each feature, which means that 4 HMM models are adopted to achieve the recognition of each performed unknown gesture. Taking LM-Gesture3D for example, the designed 8 gestures are denoted by g u , u � 1, . . . , 8; then, for the feature sequence S v u (v � 1, . . . , 4 ) of gesture g x , the following HMM modeling processes are carried out: (1) HMM initialization: according to Table 1, in the paper, N is set to be 6. e number of observation symbols M is set as the same value of the number of cluster centers shown in Table 2; the initialization model parameters are described as λ v u � (A, B, π). (2) HMM parameters revaluation: assume that the feature sequence S v u consists of K observation sequences O (k) , where k � 1, . . . , K, and each observation sequence could be represented as T . For computing π i , a ij , and b js , respectively, the observation sequence O (k) and the original model parameter λ v u are substituted into the reestimation equations as follows: A, B) is obtained. e above process would be repeated until the parameters in two adjacent iterations meet as follows: Recognition result corresponding to the maximum probability (P 1 , P 2 , P 3 , P 4 , P 5 , P 6 , P 7 , P 8 ) Figure 6: Implementation flow of the proposed method for recognizing dynamic gesture. 8 Computational Intelligence and Neuroscience where P(O | λ) is calculated from the forward-backward algorithm, which indicates the occurrence probability of the observation sequence O under the parameter λ, and ε is the predefined convergence threshold. e final model parameter λ v u is the optimal parameter of feature sequence S v u , that is, the single feature HMM of its corresponding gesture. By repeating the above modeling process for each feature sequence S v u of 8 dynamic gestures, we can obtain 32 single-feature HMM models in total.

Gesture Recognition with HMM
We present an algorithm of weighted probability fusion to compute the probability that an unknown gesture belongs to the gesture u in LM-Gesture3D as follows: where ω uv (0 ≤ ω uv ≤ 1 and 4 v�1 ω uv � 1) is the weight of feature v corresponding to the gesture u.
According to equation (10), there are 8 calculation results, in which the maximum is regarded as the recognition result of the unknown gesture. e paper employs least square method (LSM) to determine ω uv in equation (10). Here is a brief introduction to the LSM weight method. Firstly, we calculate the probabilities of four features for all samples in the training dataset and can obtain P m � P m 1 , P m 2 , . . . , P m 8 (m � 1, . . . , L), where L is the number of samples and P m u � (P m u1 , P m u2 , P m u3 , P m u4 ). Secondly, for the gesture u in LM-Gesture3D, if the sample m belongs to it, we set the probability of the sample m corresponding to the gesture u as follows: Else, the probability of the sample m corresponding to the gesture u is set to be as follows: where p s (0.5 ≤ p s ≤ 1) is a set probability.

Experiments
To test the performance of the proposed method, several experiments are carried out on a desktop PC with an Intel Core i5-3230M processor and 4 Gb of RAM, and the software environment consists of Visual Studio 2013, Leap Motion SDK 2.3.1 + 3154, and MATLAB 2012a.

LM-Gesture3D Recognition Experiment.
We select four participants with certain experiences in robot teleoperation to join the experiment. Each participant is asked to imitate each gesture in LM-Gesture3D 40 times repeatedly, and LM samples their gestures. So, there are 160 samples of each gesture.
To verify the feasibility of the proposed method, we define the recognition rate as follows: where N Rec is the number of gestures correctly recognized and M Sam is the total number of gestures recognized. Firstly, we use K-fold cross-validation to evaluate the recognition performance and stability of the proposed method. In this experiment, K is set to be 10. So, each subset has 128 samples. Figure 7 is the result of K-fold cross-validation, which shows that the recognition rates of different trained models range from 89.8% to 92.9%. e fluctuation ranges of recognition rates of all 10 trained HMM models are about 3%, which shows that the proposed method has a good generalization ability. e average recognition rate of all 10 trained HMM models is about 90.8%, which indicates that the proposed method has a good recognition performance.
Furthermore, we analyze the recognition performance of the proposed method for different types of gestures in LM-Gesture3D. We randomly select 60 samples of each gesture as the testing set and the remaining samples as the training set. Table 3 shows the recognizing results. From the table, we can see that our method has a good representation of the 8 dynamic gestures with the average recognition rate of about 90.6%. e recognition rates for all gestures fluctuate slightly between 88.3% and 91.7%. e recognition rates of G4 and G6-G8 are higher than those of G1-G3 and G5. e reason is that these gestures are relatively simpler and easier for different users to repeat, while the participant's individual habits easily influence G1-G3 and G5. In addition, gestures G1-G3 are easily confused with G6-G8, respectively.

Computational Intelligence and Neuroscience
In general, the recognition results are jointly determined by four kinds of features, and our method based on multiple features and HMM can represent most kinds of complex gestures, which proves that our method is effective.

Dynamic Gesture Recognition Experiments.
is experiment mainly tests our method's recognition rate for two kinds of relatively simple dynamic gestures, which are named letter-gesture dataset and the waving-gesture dataset, respectively. As shown in Figure 8(a), letter-gesture set consists of 6 gestures numbered 1 to 6, which are similar to each other. e waving-gesture dataset contains the rest 6 gestures shown in Figure 8(b). It can be seen that the main feature of two gesture sets is trajectory feature and palm posture feature, respectively. e gestures in the experiment are sampled from four participants. Each participant is asked to repeat each gesture 50 times. When collecting the letter-gesture dataset, each participant keeps the shape as unchanged as possible and parallel to the horizontal plane of LM. Each gesture's obtained data are further divided into 120 sets of training samples and 80 sets of testing samples.
Chen et al. [17] propose a rapid early recognition system based on SVM to achieve multiclassification among the 36 dynamic gestures (the 3D motion trajectory of the numbers and the alphabet). Chen's method uses LM to capture 3D motion trajectories of the gestures, which is the same as our method. In Chen's method, the orientation angle is utilized as a unique feature of the gesture trajectory projected into the XOZ plane. It is quantized by dividing it by 45°and coded from 1 to 9, which is similar to our method. Chen's method is also used to recognize the gestures in the lettergesture dataset. Figure 9 shows the recognition results of our method and Chen's method. Our method and Chen's method get the average recognition rates of 96.0% and 93.5%, respectively. Two approaches have very similar recognition rates. However, the fluctuation of our method's recognition rate with LSM weights is smaller than that of Chen's approach. It shows that our method has better recognition stability than Chen's method.
In addition, the directional code extracted by Chen's method is determined by two neighboring points on the trajectory. In contrast, that of our method is determined by the trajectory points and the central point. At the same time, we also introduce a distance feature. erefore, the extracted trajectory feature by our method is not affected by the amplitude of the gesture and is of rotation invariance.
Based on the above analysis, we believe that our method performs better than Chen's method. e waving direction of gestures 7-10 in the wavinggesture dataset is from upper right to lower left, from upper left to lower right, from top to bottom, and from bottom to top, respectively. And other gestures 11 and 12 make roughly 90°clockwise and counterclockwise rotations, respectively. It can be seen that this kind of dynamic gesture could be distinguished easily once using palm posture features. We carry an experiment to test the recognition performance of our method aiming at the 6 kinds of gestures. In the experiment, the method of data acquisition and processing is the same as that in the experiment of the letter-gesture dataset.
Pan et al. [26] present a combination method based on rule-based classification and SVM recognizes the gestures, which also use LM to capture real-time frame data of hand motion and define a 14-dimensional feature set including the absolute pose of hand in the 3D coordinate system and the pose changes in the hand between the two frames. Pan's method is also used to recognize the gestures in the wavinggesture dataset. Figure 10 shows the recognition results of our method and Pan's method. e recognition rates of two methods for gestures 7-12 are all over 90, and the average recognition rates are 90.4% and 90.8%, respectively. e average recognition rate of Pan's method is slightly higher than that of our method.
Compared with our method, Pan's method will lead to more computational costs because it selects high dimension features and adopts a two-step recognizing strategy. Our method has not only a high recognition rate but also has the rotational invariance for selecting the rotation angle based on the initial posture of hand as features. Our method has a good effect on recognizing the wave or rotation gestures, such as those in the waving-gesture dataset.
In addition, all three methods above use LM to sample the gestures. e data of the features defined by three methods can be obtained quickly and accurately by LM. But adopting the camera approach, we have to depend on the  hand area feature to recognize the gestures, which is more complex and challenging. Hence, we can conclude that LM brings excellent benefits to our research.

Generalization Experiment.
A generalization experiment is carried out to verify the adaptability of our method to nonstandard gestures. We select four inexperienced participants for the experiment. In the experiment, each participant is asked to repeat each gesture from LM-Ges-ture3D 40 times. A total of 1280 different gestures are sampled, which are recognized by the built HMM mode and the same weights in the LM-Gesture3D recognition experiment. e average recognition rate of 90.5% shown in Figure 11 is very similar to that of the LM-Gesture3D recognition experiment. So, the method is adaptable to different nonstandard gestures and has a good generalization ability.
We defined positive prediction value (PPV) and accuracy (ACC) of the gesture Gi (i � 1, 2, . . ., 8) as follows: where T Gi is the number of gesture Gi correctly recognized and F Gi is the number of other seven gestures that are incorrectly recognized as Gi.
where F Gi is the number of gesture Gi that is incorrectly recognized as gesture Gj. Table 4 shows the confusion matrix of the generalization experiment using the proposed method. According to Table 4, except for G5 with a PPV of about 0.96, the recognition precisions for the other seven gestures have a small difference ranging from 0.89 to 0.91.

Comparison Experiment with Other HMM-Based
Methods. Here, we compare the recognition performance of the proposed method with other recognition methods based on HMM. e authors in [11] define three features, including handshape, palm trajectory, and distance from the camera to extract the hand model from image features. It proposes a combinatorial method based on HMM and BPNN. e HMM-BPNN method uses the classical HMM to evaluate and decide the dynamic gesture features and, then, uses the BP neural network to classify the input state sequence.
In this experiment, the experimental samples are from the LM-Gesture3D recognition experiment in Section 5.1, from which 60 samples of each gesture and the remaining samples are randomly selected as the testing and training sets, respectively. e experiment is divided into two parts, including the feature testing and algorithm testing. e feature testing experiment uses the features defined in the paper [12] to describe the gestures and analyze the HMM-BPNN method's recognition rate. Table 5 shows the recognition results of the experiment. From the table, we can see that the HMM-BPNN method has an average   recognition rate of only about 50.83% for 8 dynamic gestures. Moreover, for different types of gestures, its recognition rate fluctuates greatly. e main reason for the low recognition rate of the HMM-BPNN method for the gestures in LM-Gesture3D is that the three types of 2D features defined by the method are only suitable for representing simple and highly differentiated gestures but cannot fully represent complex and highly similar gestures, such as G5. e algorithm testing experiment uses the features defined by our method to describe the gestures and analyze the recognition rate of the HMM-BPNN method again. Table 6 shows the recognizing results of the experiment.
From Table 6, we can see that the HMM-BPNN method has an average recognition rate of about 80.83% for 8 dynamic gestures. e recognition rate of the experiment is 30% higher than that of the feature testing experiment. Moreover, for different types of gestures, its recognition rate fluctuates less. e results show that the paper's features can more effectively represent complex gestures in LM-Gesture3D than that of the HMM-BPNN method.
For the same gesture samples and the same defined features, the recognition rate of our method, shown in Table 3, is more than 90%, which is about 10% higher than that of the HMM-BPNN method. We think there are two main reasons for the relatively low recognition rate of the HMM-BPNN method. Firstly, the input of the BPNN classifier is decided by a maximum assessment of the probabilities of the trained HMM models of four types of features, which does not consider the interference between similar features. Secondly, the BP neural network is prone to fall into local minima, which increases the risk of misrecognition when different sample features have significant similarities.

Conclusion
In the paper, a fusion recognition method based on multiple features and HMM for the dynamic gesture is proposed. We consider both the change in handshape and moving trajectory and build four sorts of hand features with the advantages of being straightforward, simple, and rotation invariance, which bring better operation naturalness and flexibility for the operators. What is more, it offers a further expansion of more kinds of complex dynamic gestures by using these features. For each feature, we have built its corresponding HMM. In the recognition stage, we innovatively present a weighted fusion algorithm to calculate the occurrence probabilities and get the final recognition result. In the above way, the result is not easily affected by a particular feature. e experimental results show that the proposed method is suitable for relatively simple dynamic gestures like letter gestures and waving gestures. Still, it also has strong robustness for complex dynamic gestures like LM-Gesture3D. e average recognition rate of the proposed method for LM-Gesture3D is up to 90.6%. Besides, the average recognition rate for inexperienced participants is about 90%. ese results demonstrate the usability and feasibility of the proposed method.
Like other gesture recognition methods, the proposed method inevitably has certain limitations, and a more indepth study needs to be carried out. Firstly, as we have adopted four HMMs for each gesture recognition, the algorithm's efficiency remains to be raised. Secondly, we have not yet done more research on the adaptive weight method and their further impact on the recognition rate, which will also be a future research direction.
Data Availability e research library related to the dissertation will be established in GitHub (https://github.com/glchenwhut), where you can access the folders and find experimental data and lists.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this paper.

Authors' Contributions
Guoliang Chen conceived the idea, designed the experiments, and wrote the paper. Kaikai Ge helped with the algorithm and to analyze the experimental data.