Multi-Inertial Sensor-Based Arm 3D Motion Tracking Using Elman Neural Network

,


Introduction
Human motion tracking is the procedure where the trace of human movements can be detected in quantity and quality via onbody sensors [1].Nowadays, this technology is applicable in a wide range of fields including medical health [2], virtual reality [3], and sports biomechanics [4].
There are currently motion tracking methods such as marker-based optical tracking, exoskeleton-based mechanical tracking, and IMU-(inertial measurement unit-) based tracking.The optical tracking system has good accuracy, but it requires multiple fixed high-quality cameras and, thus, is restricted to a relatively small indoor space [5].Mechanical tracking confronts the problem of reducing the error between the mechanical rotational axis and the human joint [6].The IMU sensors consist of an accelerometer, a gyroscope, and a magnetometer, which can measure the orientation of the rigid body they are attached to, making it possible to track human motion [7].However, this method also suffers from limitations such as long-term drift, magnetic interference, and inconsistency [8].
Thanks to the rapid development of microelectromechanical systems (MEMS), IMU-based methods have received much attention for their portability and low cost [9].In recent years, various researches have been performed on IMU-based methods, especially in the data fusion field [10].Zhu and Zhou designed a real-time motion tracking system based on a Kalman filter using IMUs [11].Xiaoping et al. developed a quaternion-based extended Kalman filter to obtain the optimal orientations [12].Fourati et al.
presented a complimentary observer to calculate the attitude information based on quaternions [13].Atrsaei et al. introduced a constraint of velocity to IMU to track fast motion [14].Chen et al. improve the real-time tracking strategy by the combination of displacement and movement angle using the complementary and Kalman filters [15].All these works are aimed at gaining the optimal estimation of quaternions through traditional filtering technology.Besides, these works failed to get rid of the problems with the calibration and magnetic field distortion caused by different environments.
Another problem of the IMU-based method is the alignment of multiple sensors.Zimmermann et al. develop a LSTM model to align the IMU to segment to obtain biomechanical joint angles [16].Chen et al. design a novel online IMU-based human gait estimation framework, which introduces the kinematic chain constraints between multiple segments, achieving adaptive alignment and drift rejection [17].These works contribute to alignment, but the complicated modeling process is unavoidable.
Nowadays, due to the advances in artificial intelligence, more researchers focus on how to improve the IMU measurements based on ANN (artificial neural network) [18].Zhang et al. [19] proposed a DNN model to process the IMU data, integrating the DNN estimated value and numerical value to gain a more reliable pose.To get a more precise pose, Brossard et al. [20] applied a convolutional neural network to regress the gyro corrections.This work is aimed at denoising the gyroscopes and win a good precision compared to other methods.Though it is not designed for human motion track, it shows the possibility to apply neural networks to this area.Compared with a complex deep learning model, the ENN (Elman neural network) has won the favor of researchers due to its constructions and successful applications to nonlinear problems [21].Kolanowski et al. presented an ENN-based navigation system to estimate the attitude of the rigid body where the IMU is attached [22].Guo et al. proposed an attitude calculation algorithm aided by ENN (Elman neural network) to overcome the IMU's poor adaptability to environments [23].Chong et al. proposed a genetic Elman neural network to improve the temperature drift modeling precision of gyroscope [24].All these work shows the possibility to estimate the human motion trace by processing the IMU data based on the neural network.
To eliminate extreme measurement noises and avoid the influence of the environment, this paper proposes an ENNbased method for human arm motion tracking by three attached IMUs.The high-end optical motion tracking system, Opti Track, is introduced as the ground truth, and ENN is trained to estimate the arm traces.Real-world experiments of arm motion tracking are then carried out to verify the effectiveness of the proposed method.The results show that the accuracy and robustness of the method are both acceptable.
The rest of this paper is organized as follows.Section 2 provides detailed information about the proposed method.Section 3 reports the environment, process, and results of the experiments.And in Section 4, the authors discuss the results and possible error sources.Finally, Section 5 draws the conclusion and future work.

Methods
Generally, a human arm can be modeled as three joints connecting two rigid bodies, as is shown in Figure 1.The arm can be simplified by two consecutive links, and three IMUs are attached to the three joints (the wrist, elbow, and shoulder) to describe the motion trace of the arm, each one described in a frame defined as where O i denotes the orientation while the other three present the position coordinate in the corresponding coordinate system.
Based on the human arm model setup, an ENN-based model is presented to estimate the trace of the arm.Figure 2 depicts the procedure of our proposed method, which contains two steps.
Step 1. Data preprocessing: the collected data are first segmented and then preprocessed.The acceleration and angular velocity are applied to calculate the attitude information, which is then aligned in the same frame.
Step 2. estimation/estimation: the data after preprocessing is set as the input of the ENN while the ground truth coordinates are collected by the optical tracking system.And the output coordinate P = ½x y z T can describe the arm trace.In this step, we introduce the feedback to help optimize the body acceleration.
When the IMU data has been collected and preprocessed, the well-trained ENN model is called to compute the coordinates of the arm, and then, a smoother is applied to gain the final trace of the arm. 2 Journal of Sensors gyroscope [25].The acceleration can be decomposed into three components as where g is the gravitational acceleration, a b denotes the body acceleration (the acceleration generated by person movements), and ε represents the measurement noise, of which the distribution is normally Gaussian distribution.Among the three components, g can be considered a constant vector for any object; thus, it is feasible to pick g out to estimate the orientation of the object.With the assumption that the noise ε can be neglected, we design a lowpass filter for the acceleration signal to extract the gravity component [26].
In practice, m measured by the magnetometer will be affected by the ferrous materials in the environment, so it is necessary to calibrate the magnetometer before estimating the arm movements [27].However, it is inconvenient or even troublesome when the number of sensors increases.Considering a and ω are capable to offer enough information of arm movements, m is removed from the orientation estimation.Thus, the adaptability of IMU to environments can be improved in some way.However, the change of magnetic field strength with the movement is an important feature, so it is used to improve the ENN model, which will be detailed in the next section.
For arm motion tracking, it is particularly important to obtain the attitude information of the arm segment; hence, it is necessary to put the attitude information into the network [28].There are three common methods to calculate the attitude, namely, the Euler algorithm, direction cosine method, and quaternion method.The direction cosine algorithm is widely used in navigation coordinate systems; however, the complicated calculation restricts its application in motion tracking.The quaternion method shows advantages in fast computation and all kinds of attitude calculation, but it does not allow separating the attitude angle directly and is easy to fall into instability once the measurement of one sensor gets disturbed [29].Though the Euler angles suffer the gimbal lock, they are more understandable and efficient in decomposing rotations into individual freedoms, requiring less computational efforts [30].
This paper applies the Euler algorithm to calculate the attitude.By solving Equation (3), we can get the roll (ϕ a ) and pitch (θ a ), and by solving Equation ( 4), we can get roll (ϕ g ), pitch (θ g ), and yaw (ψ g ).By fusing the data from the accelerometer and gyroscope, we can get the final attitude as where subscript a or g denotes the angle calculated by the data from the accelerometer or the gyroscope and K is a scale factor, whose value is 0.4 in our example.3 Journal of Sensors calculate the coordinates.The coordinate calculated by the Opti Track system is set as the ground truth data.To find out the mapping relationship between the IMU data and coordinates, we develop an ENN for each IMU.
ENN is one kind of recurrent neural network (RNN), and its structure is depicted in Figure 3.It is composed of three layers, namely, the hidden layer, output layer, and context layer [31].Compared to other ANNs, ENN is more popular for its unique advantage that the context nodes can memorize the values of previous hidden nodes, which makes ENN applicable in the fields of dynamic system identification and prediction [32].
And the Elman network is also denoted by the following equations: During the training process, we pay more attention to the drastically changing axis.Although the accuracy of m is severely affected by the environment, the relative changes of different actions are similar.Therefore, we can consider using m to improve the traditional MSE (mean square error) loss function.We redesign the loss function as where w x , w y , and w z are the error weights calculated according to the change of magnetic field strength and e presents the MSE of the error on each axis.Equation (10) shows the calculation of w x as an example.
where std presents the standard deviation of the vector in the subscript.
We introduce feedback to calibrate the input a b .The predicted coordinate is applied to compute the body acceleration and helps correct the input.Also, we take the x-axis as an example to explain.First, the acceleration at sample i on the x-axis in the Opti frame can be calculated as Since the IMU and Opti frames are aligned, the estimated body acceleration âb can be figured out by rotating a oi with the matrix ð8Þ And then, the weighted average of the measured and estimated body acceleration is regarded as the corrected a b .Now, the input and output matrixes are set as , x N y N z N where the input components are calculated in Section 2.1, while the target components are provided by the Opti Track system.After the estimation, we apply the five-dot-cubic algorithm [33] to smooth the coordinates.

Experimental Setup.
To verify the efficiency of the proposed algorithm, an experiment was carried out.The ground truth was obtained from the Opti Track Motive system of millimeter-level accuracy, while tracking a subject equipped with 4 IMUs.Three of them were attached to the left arm (on the wrist, elbow, and shoulder) of the subject with three corresponding markers for the optical system to track.And the fourth IMU is fixed on the chest of the subject as a reference.Here, the x-axis is pointing forward, the y-axis is pointing to the right side, and the z-axis is perpendicular to the ground.Figure 4 shows how the subject wears the IMUs and the markers.After the experimental environment was set up, the subject was asked to perform several movements at a relatively slow speed, including forward-smooth-lift (FSL), lateralsmooth-lift (LSL), forearm-supination (FS), and elbowsmooth-lift (ESL).Each movement gets started and ends up with the N-pose gesture (standing still with the arms vertical alongside the trunk on the ground) and lasts for at least 10 seconds.Figure 5 shows how the movements are organized.
Captured data include the marker positions in the optical coordinate system and the IMU signals in the respective sensor reference system.The data by the Opti Track system was sampled at 120 Hz while the data by IMU (HI221, hipnuc) was at 35 Hz.We resampled the Opti data to make its frequency rightly the same as the IMU data.IMU and Opti data were captured by different terminals, but they were manually synchronized by the N-pose gesture at the beginning and end of each sequence.This synchronization method may result in a misalignment in time, but it is acceptable for the time misaligned is quite short.

Performance Index.
To assess the performance of our method, we develop some indices to evaluate the model accuracy and robustness.Given two variables with N samples, P (describing the estimated position on one axis) and G (describing the ground truth position on the same axis), the following indices can be used for assessment: Mean error: Maximum error: Correlation coefficient: 3.3.Results.The aligned IMU and Opti data are then put into the Elman NN to train the model, where the data is separated into a train set (70%) and a test set (30%).Then, we adapt the trained model to estimate a new independent motion to assess the generalization ability of the model.We have compared the accuracy and the robustness on several aspects, and the results are listed as follows.Here, we pay more attention to the x-axis and the z-axis, for the arm motions in experiments have few movements on the y -axis, which can be regarded as random error.
First, the data captured in one motion is implemented to verify the proposed algorithm.Figure 6 depicts the error between the estimated coordinate and the ground truth data on the x-axis of the wrist in four motions, which shows that the proposed method can help to get the trace of the arm relatively accurately.Table 1 reports all the performance indices of different parts of the arm in the 4 motions in test sets.
Then, to evaluate the robustness of the method, the welltrained model is implemented to estimate another four independent motions.Figure 7 is the boxplot to depict the errors on the x-axis on the wrist in the four motions.5 Journal of Sensors reports the performance indices of the z-axis on the wrist in four motions.Finally, to further verify the effectiveness of the proposed method compared with traditional methods, we compare our method with the four classical methods, namely, the Zhu model [11], Yun model [12], Young model [34], and Bleser model [35], based on the dataset of [8]. Figure 8 compares the errors on the x-axis between the selected methods on the E FE motion (elbow flexion/extension), and Table 3 reports the errors on the three axes of the five methods.

Discussion
This section will discuss the performance of the proposed method on tracking the arm based on the results in Section 3 from two aspects, accuracy and robustness.Then, some possible error sources of this work will be mentioned, which can be a guide for our future work.

Accuracy.
The first aspect taken into consideration is the accuracy, which is reflected by the mean and maximum errors on the three axes.Generally, the smaller the errors are, the more accurate the model is.The analysis of the errors in Figure 6 and Table 1 suggests that the accuracy of the method is acceptable.On the one hand, the 12 welltrained models (each motion has one model for each part) all play a good performance.The mean error of each model is around 30 mm, and the maximum errors are around 50 mm (very few can reach over 100 mm).On the other hand, the errors on the x-axis have a similar performance to those on the z-axis (they have their advantages in different motions).Overall, the error of the action is acceptable.The reconstructed IMU motion trajectory has a high correlation with the Opti system, and only 4 values are lower than 0.85.This shows that the reconstructed motion has a high consistency with the actual motion.We can get better results from the reconstructed motion and discover the characteristics of the original action.From this perspective, the accuracy of our proposed method is good.A comparison between the method proposed in the article and other traditional methods is also conducted.Based on the open-source data set provided by the literature, we compared the errors of these methods on the three axes.Figure 8 shows errors on the x-axis, and Table 2 reports errors on the three axes in detail.We can find that the proposed method performs best on the xand y-axes, reducing about 37.2% of the mean errors and on the z-axis, the error is also acceptable.

Robustness.
The next aspect that plays an important role is robustness, which is reflected by the performance of other independent estimations based on well-trained models.Generally, robustness refers to the ability of the model to tolerate perturbations.We have tested other four independent actions for each motion to verify the robustness of the proposed method.
Figure 7 presents the distribution of the error on the x -axis in the new actions.The red symbol, "+", represents the outliers (values that reach over 1.5 times over the interquartile range).We can find that the mean errors of the new actions are similar to those of the test set while the gross errors seem to have increased.And Table 2 supports the point furtherly.The E m of the motion has increased by about 40 mm, which suggests the weak robustness of this method.Nevertheless, the correlation values are consistent with those   1, showing that the estimated trace can reconstruct the human motion.Therefore, it can be figured that the proposed method can predict the trajectory of the same motion well, regardless of whether they are continuous actions.Although the maximum errors/outliners become larger, which can be reduced by introducing the kinematic chain in the future, the consistency of its actions has not decreased.Overall, the robustness of the model is acceptable.

4.3
. The Error Sources.Though the accuracy and robustness are acceptable totally, there are still some unpredictable errors (like the maximum errors in Tables 1 and 2), which may be caused by the following two aspects: 4.3.1.Experiment.In Table 2, we can figure that the E and E m is larger than those in Table 1 while the C is similar, thus, these errors may be caused by the independent experiments.This is because there is some difference between the two experiments on the position where the experimenter stands, which may cause a relatively constant error on the xor y -axes.But this will not have a serious impact on the reconstruction of the arm trace, for the reconstruction of the arm movements is still clear and the correlations perform well.
4.3.2.Data.The maximum errors shown in Section 3 are unable to ignore and are possibly resulted by the collected data in our experiments.On the one hand, there are some missing values of the ground truth data (Opti data) caused by some unavoidable occlusion of some markers.We have filled the missing values using the interpolation and resample it to the same frequency with the IMU data, which may introduce some outliers with errors over 100 mm to some extent.On the other hand, the IMUs have been continuously working during the entire experimental time, which may result in more noise in the last several motions.

Conclusion
This paper proposes an arm motion tracking method based on wearable inertial sensors, using the ENN network.This method effectively avoids the problem of poor adaptability to the environment of traditional inertia-based solving methods.In terms of model training, the magnetometer information is perceived by IMU to train the model and applies the acceleration and angular velocity to calculate the attitude angles, which are set as the ENN input vector.
To calibrate the body acceleration, feedback is designed, the more accurate results can be derived.Finally, the five-dotcubic algorithm eliminates the errors of the estimated trace.Experiments verify the effectiveness of the proposed method in both accuracy and robustness.In addition, this article also uses open-source data to compare with other traditional esti-mators to further verify the reliability of the ENN-based method.In practical applications, this method quite suits the situations where the fixed motions require assessment, including rehabilitation and fitness exercises.The italicized values highlight the models that perform best.9 Journal of Sensors

Figure 2 :
Figure 2: Flowchart diagram of the proposed method for one segment.Vectors a i , ω i are the values measured by the IMU at the ith sample, ϕ i , θ i , ψ i are the Euler angles (roll, pitch, and yaw), and the vector P is the estimated position (a three-dimensional coordinate) of the IMUattached object.

Figure 3 :
Figure 3: Structure of Elman neural network.In this structure, u is the input vector and y is the output vector; w 1 , w 2 , and w 3 are the weights; x is the unit vector in the middle layer and x c is the feedback state vector; b 1 , b 2 are the deviation vector; f is the transfer function of the hidden layer; and it is usually a tanh function while g is the transfer function of the output layer, and it is usually a linear combination of the outputs of the middle layer.

Figure 4 :
Figure 4: Positions of the IMUs and optical track markers on the subject from the front and left sides.

Figure 6 :
Figure 6: Errors on the x-axis on the wrist of four motions, with the red dotted line presenting the mean error of each motion.

Figure 7 :
Figure 7: Errors on the wrist of four motions.A well-trained model is applied to estimate another independent four motion.The boxplot shows the median of the error along with the 25th and 75th percentiles.

Figure 8 :
Figure 8: Comparison of performances on the x-axis of E FE .The bar plots show the mean error, median, and maximum errors of the proposed and other 4 traditional methods.

Table 1 :
Performance indices of each test set.

Table 2 :
Performance indices of new estimations on the x-axis of the elbow.Each motion has one well-trained model, which is applied to estimate another four independent actions.The italicized values highlight the models that perform worse.

Table 3 :
Errors on the three axes of proposed and traditional methods.