Pedestrian Stride Length Estimation from IMU Measurements and ANN Based Algorithm

Pedestrian dead reckoning (PDR) can be used for continuous position estimation when satellite or other radio signals are not available, and the accuracy of the stride length measurement is important. Current stride length estimation algorithms, including linear and nonlinear models, consider a few variable factors, and some rely on high precision and high cost equipment. This paper puts forward a stride length estimation algorithm based on a back propagation artificial neural network (BP-ANN), using a consumer-grade inertial measurement unit (IMU); it then discusses various factors in the algorithm. The experimental results indicate that the error of the proposed algorithm in estimating the stride length is approximately 2%, which is smaller than that of the frequency and nonlinear models. Compared with the latter two models, the proposed algorithm does not need to determine individual parameters in advance if the trained neural net is effective. It can, thus, be concluded that this algorithm shows superior performance in estimating pedestrian stride length.


Introduction
Global navigation satellite systems (GNSS) play an important role in daily life; however, in some places satellite signals may be severely degraded or may not be received at all, leading to issues with continuous navigation [1]. Location based services (LBS) are useful to individual users, so inertial navigation and pedestrian dead reckoning (PDR) have been studied to help overcome the limitations of satellite signals. Inertial navigation requires accurate initial alignment and heading information in real time, and owing to the drift of gyro, it must be combined with other information for positioning. This will increase the complexity of the use of information fusion algorithms and hardware, thereby raising the cost of pedestrian positioning.
PDR can achieve continuous position estimation when satellite signals cannot be used. When the sensor is attached onto the body or a handheld device, PDR can achieve better positioning performance than traditional inertial navigation, even when a tactical level sensor is used [2]. PDR comprises four phases: step detection, step (or stride) length estimation, heading estimation, and navigation results update. Because accelerometers placed on the body are motion-sensitive, the data can be processed to detect steps [3]. PDR has become an effective positioning technology, and acceleration signal statistical parameters can be used to estimate stride length.
Step detection algorithms include the zero crossing method, peak detection method, and autocorrelation method [4,5]. The stride length estimation algorithm is complex, because there may be a variety of motion patterns during walking or running. The patterns include walking slowly, walking normally, walking rapidly, and running.
PDR is related not only to the number of steps but also to step length; this differs greatly among individuals and is also related to the speed of walking. The step length can vary by nearly 40% among pedestrians walking at the same speed and up to 50% throughout the range of walking patterns of an individual [6]. The simplest approach to estimation of step length is to take this as a constant model for one person [7]; however, this model cannot well adapt to a change of pace. A linear relationship between step length and pedestrian height was presented in [8], but the variation of step length during walking had been neglected. Yang and Li identified a close relationship between the frequency and step length and  proposed an algorithm to estimate step length based on the step frequency in paper [9]; the authors of paper [10] adopted this linear model for PDR. However, step length is also related to acceleration variance [11], vertical velocity [12], and so on, so simply considering step frequency is not sufficient. Taking into account step intervals, acceleration variance, and inclination, the step length can be modeled by a multivariate equation [13]. A nonlinear model with only one coefficient was proposed to estimate step length [14,15]; however, this model's coefficient may vary between different pedestrians. Chen et al. proposed a method to detect movement of the body by measuring electromyogram (EMG) signals of leg muscles [16], but it is inconvenient for pedestrians to bind EMG sensors to the gastrocnemius. Toth et al. developed a step length estimation algorithm based on artificial neural networks and fuzzy logic [17], which significantly improved the accuracy of this method; however, a backpack system with a variety of sensors was used, so the structure was complex and the cost was relatively high. BP-ANN has also been used to characterize steps by Anacleto et al. [18], but only to learn gait behavior rather than to estimate step length.
The aim of this paper is to propose a universal model based on BP-ANN to estimate pedestrian stride length; this model does not need to predetermine pedestrian parameters each time, which is different from the frequency model in [9,10] and the nonlinear model proposed in [14,15]. In order to do so, we used the trained net based on 13 test subjects' data to estimate stride length of three other subjects; the results have verified the feasibility of this algorithm. This paper is organized as follows: the sensor hardware used in experiments is presented in Section 2.1. Data collection and processing, and different models for data fitting, are described from Sections 2.2-2.5. Experiments and results are given in Section 3. Finally, Section 4 concludes the paper and discusses the research direction for future work.  Figure 1 shows the IMU used in the research. Its operating voltage is 3 V-6 V, and the measurement range of the three-axis MEMS accelerometer is ±16 g. The sensor data can be transmitted from 0.1 to 200 Hz to a smartphone using Bluetooth. The IMU can output acceleration, angular velocity, magnetic intensity, and air pressure. In this paper, acceleration was analyzed and is discussed in detail.

Data
Collection. The acceleration waveforms differ when an IMU is placed on different parts of the body. In order to improve the accuracy of step detection and stride length estimation, the IMU is usually placed on the foot or leg. Lower placement is more sensitive to phases of the walking cycle [3]. When the IMU is attached to the foot, it can better reflect pedestrian movement, and step detection is also more reliable [19]; for data collection in this paper therefore, the IMU was attached to the foot. Because stride length not only is related to walking patterns, but also varies with different pedestrians, we collected data from 13 test subjects using different walking patterns; these included walking slowly, walking normally, walking rapidly, and running. The data sampling rate was 100 Hz. The test was conducted in the straight corridor of the Second Floor, Building 9003, Tsinghua University. The experimental site and the placement of the IMU are shown in Figure 2. Figure 3, the walking cycle of a pedestrian can be divided into two main phases: the stance phase and the swing phase. The stance phase starts with a heel strike moment and ends with a toe off moment, with each phase corresponding to a footstep [20]. Because the effects of heel strikes and toe off moments are distinct, in most cases it is not difficult to localize them. The data from the three-axis accelerometer was collected during walking but cannot be double integrated directly because of the error accumulation; instead, the accelerometer was used as a pedometer-similar to [21]-and the stride length was estimated based on the statistical characteristics of acceleration data. The -axis, -axis, and -axis output data of the accelerometer are defined as , , and , respectively. In the test, the -axis corresponds to forward direction while -and -axes correspond to leftward direction, and to the direction given by the cross product of and , respectively. Figure 4 shows how the data of , , and changed when one test subject walked normally.

Data Analysis. As shown in
It can be seen that , , and show certain cyclical characteristics, especially in the case of due to its regular change with foot up-and-down movements [10]; this conforms to the characteristics of vertical acceleration discussed in papers [22,23]. However, to avoid the effects of sensor tilt and body swing, the magnitude of the acceleration but not the acceleration component is used for stride length estimation as shown in (1). This is because the acceleration magnitude is a robust feature of the footstep and is insensitive to the orientation of the sensor unit [24,25].
where is the acceleration magnitude. Use of the acceleration component for stride analysis is illustrated in Figure 5, where an obvious cyclical characteristic can be seen. It is clear that each walking cycle has a period with a sharp change of waveform, and an approximately constant period, and that the two periods correspond to different phases.

Two Common Models for Data
Fitting. In order to analyze the effects of estimating stride length with different models, this section illustrates the use of the two common models used for data fitting. As shown in (2), an empirical nonlinear model can be used to estimate the stride length [14,15,[25][26][27].
where is the stride length, Acc Vmax (or Acc Vmin ) is the maximum (or minimum) vertical acceleration in a stride, and is the personalized parameter. This model seems simple because it has only one coefficient, but in order to find the maximum and minimum vertical acceleration in each stride, initial alignment must be completed [28]. This can be done using accelerometers and magnetometers, and we can then obtain the vertical acceleration by utilizing acceleration measurements as in [27,28]. Figure 6 shows the variation in vertical acceleration of test subject number 1 at normal walking speed.
In order to get the value of of test subject number 1, we examined the eight sets of data shown in Table 1, where Acc Vmax is the mean value of maximum vertical acceleration in the same walking pattern, Acc Vmin is the mean value of minimum vertical acceleration, and denotes the mean value of stride length. In each walking pattern, the test subject walked twice, and two sets of data were collected.
According to the data, we can find the nonlinear model with the lowest sum of square errors of test subject number 1 as follows: The frequency model is also widely chosen to estimate the stride length [9,10,24,29]; this model is shown as follows: where is the walking frequency and and are coefficients.
It must be pointed out that the parameters of this model may vary between pedestrians. As shown in Figure 7, this gives a linear model of test subject number 1:  is a powerful study system. It can achieve nonlinear mapping between inputs and outputs. It consists of an input layer, hidden layer, and output layer, and its classical architecture is shown in Figure 8. Its weights and thresholds are continuously adjusted, to approximate the desired input and output mapping relationship.
The active function of the hidden layer is the sigmoid function presented in (6), while the output layer is a linear function where is the independent variable.
The input of neurons in the hidden layer can be described as where V is the input of the th neuron in the hidden layer, is the connection weight of the th neuron of the input layer to the th neuron of the hidden layer, and denotes the input of the th neuron of the input layer. 0 is the number of neurons in the input layer, and is the threshold of the th neuron of the hidden layer. Time (s) Figure 6: The variation in projected vertical acceleration of test subject number 1 at normal walking speed.
Stride frequency (Hz)  The outputs of neurons in the hidden layer can be formulated as (8) by using the sigmoid function shown in (6): where is the th neuron's output in the hidden layer and V is the same as for (7). The output of BP-ANN iŝ wherêis the output of the neuron network, 1 is the number of neurons in the hidden layer, is the connection weight of the th neuron in the hidden layer to the output layer, and the definition of is the same as for (8).
In this paper, five variables which may be closely related to stride length are studied, using data collected from 13 test subjects aged 22-29. These variables include mean stride frequencies stride , maximum acceleration in a walking cycle acc max , acceleration standard deviation acc , mean acceleration acc , and height of test subjects ℎ. These parameters were chosen because many papers have illustrated that they  are closely correlated with stride length [3,8,9,14,15]; for example, [3] found that the acceleration variance and maximum acceleration value had a correlation with stride length which reached 76% and 68%, respectively, and [8] proposed using pedestrian height to estimate stride length. In this paper, we propose that the stride length is estimated from BP-ANN with these five parameters.
The elements of input vector are as follows: 1 = stride , 2 = acc max , 3 = acc , 4 = acc , and 5 = ℎ. The desired network output is stride . The BP-ANN model can approach the desired output by training the network. The Neural Network Toolbox (NNT) in MATLAB is used to build the BP-ANN model, and the data collected from 13 test subjects were randomly divided into three parts: 70% of the data for training and 30% of the data for validation (15%) and testing (15%). The training data is used to train a neural net, with the details of the training algorithm as follows: First, the data are processed by Function (7), using a random weight matrix and a random threshold . Then, the data are processed by Function (8) in the hidden layer. Finally, output values from the hidden layer are handled by Formula (9), and the output value of BP-ANN can be obtained. To make the output values approach the target values, parameters and are dynamically adjusted; there must be a criterion to judge when to stop training however, so the validation data are used to determine the training time. The default condition is that when the error is not reduced six times consecutively, the training will end. We can then find the training net corresponding to the lowest error. The test data are used to test the effect of the neural network. By adjusting the number of neurons in the hidden layer and conducting cross validation, we found that there is a superior effect when the number is 10.
In Figure 9(a), the abscissa represents the target value, the ordinate represents the output value, and the fit line represents the functional relationship between the target value and the output value. In theory the slope of the fit line should be one; namely, that (the output value) was equal to (the target value), but this is very difficult to achieve. Instead the value of , which is the relevance between the actual outputs of BP-ANN and the target values, becomes an important criterion when judging the effect of fitting. It indicates that the data fitting result is better when is closer to one. It can be seen that the values of in the  training part (upper left), validation part (upper right), test part (lower left), and the whole of the data part (lower right) are relatively high. Figure 9(b) shows the BP-ANN training errors. Instances in the figure denote the count of the errors, with the range of the errors divided into 20 equal-length segments, or 20 bins. The unit of the horizontal coordinate is the meter. The figure shows that the errors are small and relatively concentrated, meaning that, in most cases, the errors of the estimated values are within a small range. Figure 9(c) shows as the relevance between outputs and target values in all data parts, while is the number of cross validations. The range of is from 0.953 to 0.982 when cross validating, so the ANN is stable and the effect of fitting, or the matrixes of weight and threshold obtained, is good.

Experiments and Results
To verify the performance of the proposed model, more experiments were conducted. First, to further compare the effects of three different models including frequency, nonlinear, and BP-ANN, the data of three new test subjects named A, B, and C were collected. These subjects covered a route of 30 m in the same building each time and used the same IMU at a sampling rate of 100 Hz. The height of test subjects A, B, and C was 1.72 m, 1.81 m, and 1.75 m, and their ages were 27, 23, and 24, respectively. Table 2 shows that where , , and are defined in (2) and (4), the frequency model and nonlinear model coefficients of test subjects A, B, or C can be obtained after data processing, using the method described in Section 2.4.
The feasibility of our BP-ANN model was then assessed by an experiment carried out on the Bauhinia playground of Tsinghua University. As shown in Figure 10, test subjects A, B, and C walked around the playground using both normal and diverse walking patterns. The output rate of the IMU had been configured as 100 Hz, and the ground distance covered was 500 m.
The variation in acceleration of subject A is shown in Figure 11. As we can see, the characteristics of a cycle are closely related to the walking pattern. When walking slowly or normally, there is an obvious stationary phase in each cycle. Moreover, the maximum value of a running cycle is considerably larger than that when walking slowly or normally.
For each walking cycle, we can find its time interval Δ and calculate the stride frequency in real time by using We can then use formula (11) based on the vector ( = [ stride acc max acc acc ℎ]) and the net previously obtained, to estimate stride length in every cycle: Because a real-time measurement of individual stride length is difficult, the mean error of the stride length is replaced by the error of the calculated distance. The reasons for this are as follows: First, it is feasible to compare the calculated distance value with the real distance value, as an index to test the feasibility and validity of the stride length model [30]. If the effect of the stride length estimation is good, the error of the calculated distance will be small. Second, we define the following variables: is the estimated covered distance, is the real distance, and is the number of strides to cover the distance; this can be counted because of the periodic change in acceleration. Additionally, is the mean value of the estimated stride length, is the mean value of the stride length, and is the mean error of the stride length. We can then obtain the following equation: Therefore, the error of the calculated distance can be used to evaluate the effect of the stride length estimation algorithm. We can find the calculated distance value by using where is the estimated value of stride length at the th walking cycle and denotes the number of walking cycles.
The values of distance calculated by the three models, and the errors, are shown in Table 3. As can be seen, compared to the frequency and nonlinear models, the error is reduced to about 2% by using the BP-ANN model. It should be pointed out however that [29] used wavelet transform and a moving average filter to preprocess the raw data, with the error of the stride length estimation at about 0.43%; and [28] proposed the use of a sensor-fusion algorithm with the error reduced to 0.2%. The errors were smaller, but the walking characteristic parameters were for a specific individual, so the generality and practicability of the algorithms were limited. In this paper, we wanted to find a universal model so that the BP-ANN trained net could be applied to others; the results show that the method is practical, but that further work needs to be done to improve the precision of the model.

Conclusions and Future Scope
In this paper, a pedestrian stride length estimation algorithm based on BP-ANN was proposed. Five variables in the walking cycle were used as the input vector to the BP-ANN.
To assess the validity and feasibility of our algorithm model, further experiments were carried out on the playground. The experimental results show that the error of our proposed algorithm in estimating stride length is approximately 2%, which is smaller than that of the frequency and nonlinear models. Furthermore, it does not need to predetermine the coefficients when using the BP-ANN algorithm. Therefore, the proposed algorithm performs better in estimating pedestrian stride length, but more work is needed to further improve the precision of the results. This paper provides a way to estimate stride length, but it is not the conclusive solution. In future work, we would consider designing a system in which the BP-ANN is trained for an individual walker only, in which case the precision of the results would probably improve. Additionally, with the increasing use of big data, more data can be used to train the ANN, and the proposed algorithm may display a higher accuracy. The above ideas will be a focus of future work.