A Novel PSO-Based Optimized Lightweight Convolution Neural Network for Movements Recognizing from Multichannel Surface Electromyogram

As the medium of human-computer interaction, it is crucial to correctly and quickly interpret the motion information of surface electromyography (sEMG). Deep learning can recognize a variety of sEMG actions by end-to-end training. However, most of the existing deep learning approaches have complex structures and numerous parameters, which make the network optimization problem difficult to realize. In this paper, a novel PSO-based optimized lightweight convolution neural network (PLCNN) is designed to improve the accuracy and optimize the model with applications in sEMG signal movement recognition. With the purpose of reducing the structural complexity of the deep neural network, the designed convolution neural network model is mainly composed of three convolution layers and two full connection layers. Meanwhile, the particle swarm optimization (PSO) is used to optimize hyperparameters and improve the autoadaptive ability of the designed sEMG pattern recognition model. To further indicate the potential application, three experiments are designed according to the progressive process of bodymovements with respect to the Ninapro standard data set. Experiment results demonstrate that the proposed PLCNN recognition method is superior to the four other popular classification methods.


Introduction
Human-computer interaction is one of the most popular topics in the field of signal processing. Detecting and judging the behavior intention leads to information communication between human and computer. Generally, physiological signals and RGB images are two mainstream pieces of information to capture human activities [1][2][3][4][5]. e recognition of physiological-signals-based body movements is the key to human-computer interaction [6][7][8], due to the problems of the image, such as occlusion by the environment, inability to be distinguished accurately, and difficulty in being segmented. In particular, as a physiological signal of the human body, sEMG is a complex electrical signal produced by muscle contraction and relaxation, which carries the movement information of corresponding parts. Decoding sEMG to recognize human behavior characteristics is called sEMG pattern recognition [9,10]. erefore, the machine is capable of representing the human movement intention; moreover, it is the popular research frontier in the field of human-computer interaction.
sEMG pattern recognition consists of the following three parts: signal preprocessing, feature extraction, and feature classification. Signal preprocessing involves filtering, denoising, and detecting active segments, etc. [11]. Feature extraction includes the following four aspects: time domain, frequency domain, time-frequency domain, and nonlinear dynamics; see [12,13]. e papers [14][15][16][17] adopt different improved methods combining time domain and time-frequency domain to extract representative features of EMG.
After the sEMG feature is extracted, it is sent to the classifier for training. Generally, traditional machine learning algorithms such as support vector machine (SVM), random forest (RF), and k-nearest neighbour (KNN) are used in previous research. Particularly, sEMG is identified by changing the rejudgment condition of SVM's misclassification data in [18,19]. It has been proven that traditional algorithms have achieved better results. However, since the voltage of sEMG is relatively weak (mV level), the voltage is susceptible to noise interference. It is difficult for the features extracted by traditional methods to achieve accurate results for actual requirements. Moreover, traditional methods need to select features and classifiers manually.
Compared with the traditional sEMG classification method, the neural network can avoid describing and extracting the features of weak sEMG data. It can learn the features independently and combine the extracted features with classification to improve recognition accuracy [20][21][22][23][24]. In recent years, EMG pattern recognition has been combined with deep learning and neural networks, such as artificial neural network (ANN), convolutional neural network (CNN), and recurrent neural network (RNN), which have been used to recognize sEMG. In the process of sEMG acquisition, the electrodes have different spatial positions. CNN is effective in sEMG acquisition because it can extract the local and global features of sEMG. Fifty-two kinds of sEMG gestures have been recognized for the first time by a simple CNN model that achieves the accuracy of general classical algorithms in [25]. A CNN network model, composed of four convolution layers, four pooling layers, and two fully connected layers, has been used to improve the variability of different testers in hand movements in [26]. Moreover, some improved algorithms based on CNN technique have been investigated to recognize the sEMG gesture during the past few years; see [27,28].
Till now, sEMG research based on CNN has also faced some problems. Firstly, the complexity computation of CNN network with many layers is expensive, which leads to the delay in the interaction between humans and computers. For example, the recognition process speed in [29] is slow even eleven convolution layers are used to extract features. Secondly, it is difficult to manually set hyperparameters and conduct experiments to verify this setting. Most researchers set the network hyperparameters, which affect the accuracy of the model, based on previous experience or just pick them randomly [30]. ese artificial settings accumulate the uncertainty of sEMG recognition. Finally, the accuracy of EMG pattern recognition can reach more than 90% when classifying less than ten gestures. On the contrary, the accuracy will be significantly reduced when more than ten gestures are required to be classified. is limits the ability of the model to interpret human motion intention. e result shows that the more the actions of dual-flow network based on CNN [30] are recognized, the less the accuracy of the network is.
Following the above discussions, the purpose of this paper is to propose a new hand movement recognition method with applications to the analysis of multichannel sEMG signals. PSO is used to adjust the network hyperparameters, eliminate interference, classify more gesture actions, and improve the accuracy of EMG pattern recognition. e proposed method can even recognize three groups of hand movements in terms of sensitivity, flexibility, and dynamics, which is superior to most of the state-of-theart methods on the Ninapro standard data set. e main contributions are summarized as follows: (1) A novel deep learning framework is constructed based on lightweight convolutional neural network (LWCNN). e proposed LWCNN, which just contains 10952 parameters and three convolution layers, is an end-to-end sEMG feature extraction network. It solves the problem of computational complexity and improves the efficiency of humancomputer interaction.
(2) e PSO algorithm is exploited to optimize the hyperparameters of LWCNN, which leads to a novel PLCNN classification model. A linear decline strategy is used to redefine the inertia weight to avoid the situation in which PSO falls into local optimum. Meanwhile, the cross-entropy loss function of the network is set as the fitness function to make predicted results close to the real results. (3) e proposed PLCNN deep learning algorithm is successfully employed to analyze the multichannel sEMG signals in order to verify hand movement recognition. With an appropriate recognition method, the speed and accuracy of hand movement recognition could be much increased. us, the efficiency of human-computer interaction can be improved.
e remainder of this paper is organized as follows. e proposed framework for sEMG pattern recognition and the designed algorithms are introduced in Section 2. Descriptions of data and experimental configuration are introduced in Section 3. e PLCNN method for sEMG pattern recognition is discussed and the experiments results, demonstrating the effectiveness of the developed framework, are provided in Section 4. Finally, conclusions are given in Section 5.

System Framework and Method
As shown in Figure 1, we propose a novel deep learning network for sEMG pattern recognition. e network is suitable for a simple one-dimensional signal, which has low complexity and few parameters. Moreover, the PSO algorithm is used to automatically select the network's hyperparameters to avoid expensive computation in gesture recognition.

Lightweight Convolution Neural Network (LWCNN)
Model. Researchers have found that CNN has important spatial significance for sEMG, because CNN has excellent performance on image and speech processing. In respect of EMG pattern recognition, CNN can extract the spatial correlation features of sEMG signal according to the position of electrodes in space. Feature extraction is integrated into 2 Complexity feature classification by local connection and weight sharing, which removes the complexity of traditional recognition methods where feature extraction and data reconstruction are needed. e main network structure of CNN includes four parts: convolution layer, pooling layer, full connection layer, and activation function. e general CNN network structure is too complex, which often leads directly to a slow convergence rate of training. Furthermore, complex structures can cause network overfitting and even reduce the accuracy of EMG pattern recognition. Moreover, due to the structural complexity increased in network structure, it is difficult to optimize the hyperparameters of the model. erefore, we design a lightweight CNN structure network called LWCNN, which not only includes fewer parameters but also could accelerate training speed. As shown in Table 1, the convolution layers' parameters of LWCNN are significantly less than those of the state-of-the-art methods, which is more conducive to subsequent optimization.
It can be seen from Figure 2 that the designed LWCNN consists of four parts for sEMG pattern recognition, mainly inputting sEMG data, extracting sEMG features, classifying sEMG features, and outputting classification results, where a, b, c are the numbers of filters and d, e, f are filter sizes.
(i) Part 1: the first part is the input module. e preprocessed sEMG signal is inputted with the size of 1 × insize, which is convenient for feature extraction in subsequent convolutional layers. (ii) Part 2: the second part consists of six layers: three convolution layers, two pooling layers, and an activation function. Convolution layers are the most important layers in the whole structure: conv-1, conv-2, and conv-3. Pooling layers are pool-1 and pool-2. e first layer is conv-1, which has a filters of size 1 × d. e sEMG is convoluted by multiple filters to obtain the spatial features of the signal. e second layer is the ReLU activation function, which preserves and maps the activated features to increase the nonlinear representation of the network. e third layer is pool-1, which performs an average pooling for the features. It not only reduces the size of features to simplify the complexity of the network, but also extracts the representative features to the corresponding feature map. e fourth layer is conv-2 with b filters of size 1 × e. Afterwards, there is a pool-2 layer for the local average. e last layer is conv-3, which contains c filters with the size of 1 × f. (iii) Part 3: the third part consists of a flatten layer, two dropout layers, and two full connection layers called dense-1 and dense-2. is part mainly synthesizes all the extracted sEMG features from the second part and expands them to obtain global information. Moreover, part features are discarded to prevent network overfitting. Finally, the processed features are mapped to the simple label space to prepare for gesture recognition. (iv) Part 4: the fourth part is the output module, in which the softmax function is used to classify the features of the network. en, we obtain the predicted results of hand movements as output.  Figure 1: e proposed network architecture for sEMG recognition. e original sEMG is sequentially fed to LWCNN for feature extraction to obtain corresponding feature maps, which has been preprocessed by filtering and denoising. e feature map is then flattened into feature vectors to the fully connected layer for preliminary identification. To better represent different gesture feature, the PSO model is adopted finally.
Algorithm 1 gives the process of LWCNN model training for sEMG signal, where a, b, c, d, e, f, batch size, and learning rate are the network hyperparameters which PSO needs for optimization as shown in Table 2.

e PSO Optimized LWCNN Model
. LWCNN model has a significant effect on the spatial processing of the sEMG signal, but the settings of hyperparameters can directly affect the performance of the model. Manually set hyperparameters not only complicate the work, but also slow down the EMG recognition process. Moreover, when the hyperparameters of LWCNN are optimized manually, the generalization ability of the model is weak.
PSO is an iterative optimization algorithm [31,32]. Its idea comes from the observation of the foraging behavior of birds. e basic idea is that each individual in the population iteratively updates itself according to the initial value. To obtain the optimal result, individuals share information with each other to adjust the population. Due to the advantages of search speed, wide search range, simple structure, and easy implementation, PSO stands out among many optimization algorithms. In this paper, we utilize PSO to adjust the network hyperparameters and automatically find the most suitable deep learning structure for sEMG pattern recognition.
PLCNN takes 10 hyperparameters in LWCNN structure as 10 dimensions of each particle in PSO. ese hyperparameters include the number of convolution kernels, the size of convolution kernels, activation function, batch size, and learning rate. According to the classic networks such as VGG and ResNet [33,34], the initial value range of PLCNN is shown in Table 2.
Suppose that, in a ten-dimensional space, m particles constitute a population without mass and volume. Each particle is represented by the velocity vector and position vector in space. In the k-th iteration, the l-th particle velocity vector v k l � (v k l1 , v k l2 , . . . , v k l10 ), and the position vector . en the l-th particle performs the next update according to its best position in all iterations, where the best position is found by m particles in the population. e best position of it is (10)) and the best solution to the population is x g � (x g (1), x g (2), . . . , x g (10)). e velocity and position updating formulas of the l-th particle in where k is the number of iterations; x k lp is the best position of particle l after the k-th iteration; x k g is the best position of the population of particles l after the k-th iteration; w is the inertial weight; c 1 ,c 2 are the learning factors; and r 1 ,r 2 are the random numbers over the interval (0, 1).
In order to enhance the performance of PSO algorithm and balance the global with the local search of the population, linear decreasing weight (LDW) strategy is adopted to make the inertia weight w change dynamically [35]. When w is large, the global optimization ability is strong, while the local optimization ability is weak; when w is small, the situation is just the opposite. It can effectively prevent the PSO algorithm from falling into the local optimal solution situation. e formula is where w ini is the initial inertia weight, w end is the final inertia weight, g is the current number of iterations, and G k is the maximum number of iterations. In this paper, the cross-entropy loss function obtained by PLCNN is used as the fitness function, and the formula is where M is the number of categories classified; y ij is the variable 0 or 1 (if the j category is the same as the observation sample i, it is 1; otherwise, it is 0); and p ij is the prediction probability that the observation sample i belongs to j category. e pseudocode of PLCNN is given in Algorithm 2. e specific implementation of the PLCNN model is shown in Figure 3. In step 1, we initialize the basic parameters of PSO and set the hyperparameters in Table 1 as the position vector of each particle. In step 2, each particle is passed to LWCNN for iterative training, and the crossentropy loss function value is used as the fitness function value. In step 3, the individual optimal value x k lp and the population optimal value x k g are updated according to the fitness function value.
Step 4 is to judge whether iterations have reached the maximum number. If it meets the requirements, step 6 is executed; otherwise, step 5 is executed.
Step 5 is to update the velocity vector, position vector, and inertia weight according to (1)-(3). en, it returns to step 2.
Step 6 is to output the optimization results.

Data Acquisition.
Our approach is evaluated on the Ninapro Database 1 (DB1), which is an open standard data set of multichannel gesture sEMG signals in Sweden. e data is similar to that acquired in real-life conditions, so it is widely used in sEMG pattern recognition [36,37]. In DB1, the sEMG data from twenty-seven healthy subjects who performed fifty-two hand movements are recorded. e fifty-two movements are composed of Exercise A, Exercise B, and Exercise C, which are twelve finger movements, seventeen wrist movements, and twenty-three hand grabbing movements, respectively. is motion involves robotics and rehabilitation medicine, covering most of the hand movements encountered in daily life. It represents the sensitivity, flexibility, and dynamics of the hand. Figure 4 shows some example fingers, wrists, and grabbing movements.
During the process of data collection, the subjects are asked to repeat the action of their right hand every three seconds, while each gesture action is repeated ten times with Input: One dimensional sEMG signal X with the size of 1 × insize Output: y Initialize learning rate and Loss for Epoch � 1 to 300 do for k � 1 to (smaple number of X/batch size) do Feature extract: X′←Convolution(X) Training: y←train LWCNN(X′) Min (loss): ←Loss � − M j�1 y ij log(p ij ) (according to equation (4)) Update the variables by gradient: Adam (learning rate, loss) end for k end for Epoch Obtain the prediction probability y ALGORITHM 1: e proposed lightweight convolution neural network. five seconds. Ten Ottobock electrodes are used to collect the data of radial brachial joint, flexor digitorum superficialis, and extensor digitorum superficialis of each subject's forearm at a sampling frequency of 100 Hz. en the collected data will go through Hampel filter to eliminate 50 Hz power line interference. More details of the data can be found in the official literature [38].

Preprocessing and Experimental Configuration.
We cut out the active sEMG action according to the labels. For each action, the sEMG data of each channel is taken as a sample. We randomly take 90% of the samples as the training set and 10% as the test set. Since the action will not be completely synchronized with the stimulus from the acquisition software with different reaction time and experimental

Input:
One dimensional sEMG signal X with size of 1 × insize Output: Optimal hyperparameter (particle position x k g under minimum fitness value) Initialize x k g according to Table 2 for k � 1 to maximum iterations G k do for l � 1 to particle number m do Calculate the fitness value according to equation (4), save particle current position parameter x k l end for l According to equation (4): update the best position of each particle x k lp and the best position of population x k g According to equations (1) and (2): update the particle velocity v k+1 l and position x k+1 l According to equation (3): update inertia weight w end for k Obtain the optimal hyperparameters x k g ALGORITHM 2: Pseudocode of EMG pattern recognition based on PLCNN. 6 Complexity conditions, the generalized likelihood ratio algorithm [39] is used to relabel the data. e relabeled data is added with zero to make up the uniform length and then normalized. e processed signal length insize corresponding to Exercise A, B, and C is 597, 594, and 556, respectively. We set the number of particles m to 10, the maximum iteration G k is 30, the learning factor c 1 � c 2 � 2 in (1), and initial inertia weight w ini � 1 in (3). As shown in Figure 5, the population has converged after continuous iteration. Accordingly, the optimal hyperparameters found by the PLCNN are shown in Table 3.
In addition, as shown in Table 4, we compare CNN, LWCNN, and PLCNN to verify the high recognition ability of PLCNN. e network structure of PLCNN is simple, and the hyperparameters are obtained by PSO, while the hyperparameters of LWCNN network are set manually. Compared with these network structures, CNN is more complex, and its hyperparameters are also set manually.

Analysis of Results
Four evaluation indexes, accuracy (5), Kappa coefficient (6), Hamming loss (8), and Jaccard similarity coefficient (9), are used to comprehensively evaluate the performance of different recognition algorithms on DB1 training set.

Accuracy
where tp is the number of samples correctly classified into the current gesture type, fp is the number of samples that do not belong to the current gesture type but are misclassified into it, tn is the number of samples that belong to the current gesture type but are misclassified into other types, and fn is the number of samples that do not belong to the current gesture type and are misclassified into other types.

Kappa Coefficient
where standard pe is usually expressed as follows: where M is the number of categories classified, a i is the actual number of class i samples, b i is the predicted number of class i samples, and N is the total number of samples. As Complexity 7 shown in Table 5, if the parameter K (K ∈ [0, 1]) is larger, the classification accuracy of the model is higher.

Hamming Loss
where Y i is the true value of type i sample, P i is the predicted value of the class i sample, XOR(0, 1) � XOR(1, 0) � 1, and XOR(0, 0) � XOR(1, 1) � 0. Obviously, when the Hamming loss H (H ∈ [0, 1]) is smaller, the difference between predicted and actual is smaller, and the classification ability of the used algorithm is stronger.

Jaccard Similarity Coefficient
For EMG pattern recognition problem, if the Jaccard similarity coefficient J is larger, the prediction results are more consistent with the actual results, and the performance of model action classification is better.
For the traditional machine learning algorithm, the recognition task is generally divided into feature extraction and feature classification. In the experiment, we extract four common EMG signal features: integrated EMG [40], root mean square [41], mean absolute value [42], and waveform length [43]. en, they are sent to three kinds of feature classifiers: SVM [44], RF [45], and KNN [46]. Moreover, ordinary CNN, LWCNN with manually set hyperparameters, and optimized PLCNN based on deep learning algorithm can directly extract and classify the original EMG signals end-to-end. e performance test for the proposed PLCNN and other five popular recognition algorithms are described by radar chart based on the four evaluation indexes. Figures 6-8 show the comparison results on the three exercise sets A, B, and C, respectively. It is obvious that the closer the point to the center, the lower the index value.
As can be seen from Figures 6-8, the accuracy of the PLCNN model in Exercise A, B, and C reaches 86.67%, 90.06%, and 89.57%, respectively.
is means that our model outperforms the other five algorithms by 19%-38%, 12%-17%, and 4%-7%. Among the three traditional machine learning classifiers, RF has the highest accuracy due to its strong resistance to overfitting and excellent stability. It is difficult for SVM to adapt to the multiclassification problem, and KNN has poor fault tolerance for training data. Compared with the traditional algorithms, deep learning models are more representative and recognizable to extract sEMG features.
In terms of Kappa coefficient, Jaccard similarity coefficient, and Hamming loss, the performance of the three traditional machine learning algorithms is significantly worse than the neural network algorithms. In particular, the Kappa coefficients of PLCNN on Exercise A, B, and C are 0.8545, 0.8938, and 0.8909, respectively. e Jaccard similarity coefficients are 0.7511, 0.8182, and 0.7869, respectively. Compared with the other five algorithms, these two evaluation indicators of PLCNN are the highest. e Hamming loss of PLCNN is 0.1333, 0.1, and 0.1043, respectively, which is the lowest among all the six algorithms. e results illustrate that the prediction results of PLCNN model are accurate, and its classification is precise.
In conclusion, LWCNN and PLCNN can recognize sEMG signals effectively. However, the PLCNN involves Table 3: Optimal hyperparameters setting.

Dimensions of each particle
Hyperparameter Activation function of dense-1 ReLU 8 Activation function of dense-2 ReLU 9 Batch size 42 10 Learning rate 0.001  Table 3) PSO to determine the optimal hyperparameters automatically and achieves better performance than LWCNN which adjusts hyperparameters manually. e running time represents the complexity of the algorithm and reflects the speed of sEMG pattern recognition. Table 6 shows the average test time of the six algorithms in three exercises and all fifty-two actions. PLCNN takes 5 ms, 7 ms, 9 ms, and 19 ms to recognize twelve, seventeen, twenty-three, and fifty-two actions, respectively. It is faster than the traditional machine learning algorithms, CNN, and LWCNN by 20-400 ms, 3-9 ms, and 1-3 ms. At the same time, the average test time of PLCNN is significantly shorter than that of the paper [25].
As one of the effective indexes in evaluating and identifying tasks, the confusion matrix map is widely used in various classification networks. As shown in Figures 9-11, in order to further compare the performance of three networks based on deep learning, we take Exercise A as the visual experimental data set and give the corresponding confusion matrix diagram.
As shown in Figures 9-11, the ordinary CNN model can achieve high accuracy for the classification of the fifth type of action (ring flexion) since the action amplitude is large and easy to be recognized when the ring finger is bent. LWCNN and PLCNN also have good performance in this kind of action recognition. However, the accuracy of CNN for the remaining eleven movements are not very high, especially in the ninth movement classification; it can only reach 50%, which is 30% lower than LWCNN. LWCNN network structure is simpler than CNN, for which it is not easy to saturate and classify sEMG signals accurately. PLCNN has precise recognition ability that divides five actions correctly in 12 movements, and the accuracy rate of PLCNN is higher than that of LWCNN in the ninth category by 10%.
In order to verify the robustness and convergence of the model, the loss value and accuracy of LWCNN and PLCNN are visualized in the process of training and testing. Figures 12(a), 13(a), and 14(a) represent the loss change graph of LWCNN and PLCNN on Exercise A, B, and C, respectively. e corresponding loss values are smoothed by the log function. Figures 12(b), 13(b), and 14(b) show the accuracy variation diagram of LWCNN and PLCNN on Exercise A, B, and C, respectively.
In the training process, the loss values of LWCNN and PLCNN decrease with iteration increasing and effectively converge, as shown in Figures 12(a), 13(a), and 14(a). However, the convergence speed of PLCNN is obviously faster than that of LWCNN, and the global loss value is lower than that of LWCNN.
is shows that the optimized      hyperparameters of PLCNN model improve the performance of the model and accuracy of the predicted category. Moreover, the loss on the verification set satisfies the same distribution, and the overall loss value is slightly higher than the training process. ese effectively prove the good convergence and stability of the model. Figures 12(b), 13(b), and 14(b) illustrate that the accuracy of training and testing increases with a larger number of iterations. erefore, the trend of accuracy accords with the corresponding loss distribution, which further demonstrates the robustness of PLCNN model and highlights its high recognition ability in sEMG signals.

Conclusion
In this paper, a novel PLCNN method is proposed to recognize EMG signals quickly and accurately. e whole network is compact and simple, including three convolution layers and two full connection layers to extract and classify sEMG features. e PSO algorithm is exploited to search the autoadaptive hyperparameters, which significantly improve the efficiency of the traditional manual debugging hyperparameters. Moreover, the experiments illustrate that the proposed model reduces manual uncertainty and improves recognition performance. Notice that the proposed method obtains recognition accuracy of 86.67%, 90.06%, and 89.57% for Exercise A, B, and C. It is worth mentioning that the experiments in this paper cover almost all human hand movements under Ninapro Database 1, and the proposed method outperforms the five other state-ofthe-art methods in terms of speed and accuracy. In the future, we will investigate other evolutionary algorithms [47][48][49] and machine learning techniques [50] to further improve the recognition rate and recognition speed, and will take the hardware system design into account to research the EMG pattern recognition problems.

Data Availability
e EMG data used to support the findings of this study have been deposited in the Ninapro repository (https://doi.org/ 2010.1109/BioRob.2012.6290287).

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this paper.