Daily Human Physical Activity Recognition Based on Kernel Discriminant Analysis and Extreme Learning Machine

Wearable sensor based human physical activity recognition has extensive applications in many fields such as physical training and health care. This paper will be focused on the development of highly efficient approach for daily human activity recognition by a triaxial accelerometer. In the proposed approach, a number of features, including the tilt angle, the signal magnitude area (SMA), and the wavelet energy, are extracted from the raw measurement signal via the time domain, the frequency domain, and the timefrequency domain analysis. A nonlinear kernel discriminant analysis (KDA) scheme is introduced to enhance the discrimination between different activities. Extreme learning machine (ELM) is proposed as a novel activity recognition algorithm. Experimental results show that the proposed KDA based ELM classifier can achieve superior recognition performance with higher accuracy and faster learning speed than the back-propagation (BP) and the support vector machine (SVM) algorithms.


Introduction
Recognition of physical activity plays an important role in many fields such as physical training and health care.In particular nowadays, faced with the aging problem, a growing number of old people live alone and urgently demand advanced solutions for their health monitoring, including their physical activity recognition.
A number of key research issues are related to physical activity recognition, including how to improve the data collection mechanisms, how to select more effective features, and how to design high-performance classification algorithms.Different solutions have been proposed to address these issues, usually based on the video analysis and the wearable sensor signal analysis.The video analysis approach needs to obtain the position and attitude information from a series of body image sequences.Due to the complexity and the variability of the human physical activities, it often suffers from low accuracy and low efficiency in many practical scenarios.It is also weak for privacy of the monitored person [1,2].With the development of microelectromechanical system (MEMS) and wearable sensor networks, activity recognition approaches based on the wearable sensors, especially accelerometers, have received increasing interests, due to their advantages that can be implemented easily anywhere anytime [3][4][5][6][7][8].
In previous studies, many researchers use multiple sensors for physical activity recognition to improve the accuracy.But the hardware would obstruct the movements of the human and is not practical for long-term wearing.Also the cost of system will increase with the number of sensors [3][4][5].Therefore recently more researchers are seeking activity recognition approaches by using only one accelerometer sensor for collecting the signal [6][7][8].
Feature extraction is crucial for activity recognition.It is required to extract useful features from the raw measurement data to reduce the processing time and improve the recognition accuracy.Based on the accelerometer data, a number of features and feature extraction methods have been proposed, covering from the time domain analysis and the frequency domain analysis to the time-frequency domain analysis [4,9].The time domain method extracts features, such as the mean, the standard deviation, and the correlation coefficient, from the collected signal directly.The frequency domain method extracts features from the frequency-domain parameters, such as FFT coefficients.More recently, wavelet analysis, 2 Mathematical Problems in Engineering which can incorporate time and frequency information, has been used to perform the time-frequency analysis.
Classifier is the key for activity recognition.Besides the high accuracy and short training time, the classifier is often expected to meet the real-time and the generalization requirements.A large number of classification methods have been investigated, including the artificial neural network (ANN) [7] and the support vector machine (SVM) [10], which have been widely used in machine learning and data analysis.But these popular learning techniques often face some challenging issues such as intensive human intervene, slow learning speed, and poor learning scalability [11][12][13].Therefore, the recent extreme learning machine (ELM) [11] classifier will be proposed in this paper as a highly efficient and accurate activity recognition approach.
ELM is developed as a single-hidden layer feed-forward network (SLFN) that can randomly assign the weights between the input nodes and the hidden nodes and analytically determine the output weights between the hidden nodes and output nodes.It learns much faster than traditional gradient-based approaches, such as back-propagation (BP) algorithm.ELM tends to reach the small norm of the network output weights and achieve better generalization performance according to Bartlett's theory states [12].In theory, ELM with the same kernels outperforms SVM in both regression and classification applications [13,14].ELM provides efficient unified solutions to generalized feed-forward networks including but not limited to neural networks, radial basis function (RBF) networks, and kernel learning [15].Many improvements of ELM are under study in recent years, such as fast online learning [16], large scale ELM [17], and voting based ELM [18].Also various applications can be found based on ELM and its variants, including wireless indoor localization [19], multicollinear problem [20,21], and protein sequence classification [22].
The patterns of the physical activities vary from simple activities (such as standing, sitting, and walking) to more complex action strings (such as eating, drinking, and cycling).With the addition of new activities, the features of the existing systems may be ineffective, and the recognition accuracy may decrease significantly.To find new features will increase the workload of the researchers, and classifier design will become difficult with the increase of the number of the features.Principal component analysis (PCA) [23] and linear discriminant analysis (LDA) [7,24] are applied to select most discriminative features for activity recognition.Kernel discriminant analysis (KDA) [10,25] is an extension of LDA to obtain nonlinear discriminating features by the kernel technique for mapping the data to the feature space.In this paper, we will introduce KDA to extract more meaningful features of the activities and integrate it with ELM classifier to achieve improved classification performance.
The paper is organized as follows.The proposed approach is detailed in Section 2, including its general description, system design, data acquisition and preprocessing, feature extraction method, KDA, and the ELM algorithm.Experimental results are reported in Section 3. Finally, conclusions and the future work are given in Section 4.

Proposed Approach
The proposed activity recognition approach uses a triaxial accelerometer for data collection.As shown in Figure 1, the overall approach consists of the following steps: data acquisition, preprocessing, feature extraction, KDA, ELM training and classification.
2.1.Wearable Component.The wearable sensor in this paper employs a MPU-6000 sensor, which contains a triaxial accelerometer and a triaxial gyroscope.The triaxial accelerometer is used to collect the raw acceleration measurement signal.The sampling frequency is set at 50 Hz with the output ranging in [−4, +4].The sensor transmits the data to a mobile phone via Bluetooth module called HC-05 and the data are stored in the SD card of the phone.
The wearable sensor is small and convenient to carry and is worn on the subject's thigh of right leg, as shown in Figure 2. When the body wears the accelerometer at the state of standing, the -axis represents the acceleration in the lateral direction, the -axis represents the acceleration in the longitudinal direction, and the -axis represents the acceleration in the vertical direction.In order to facilitate observations, the measurement of each axis is divided by the gravitational acceleration (taken as 9.8 m/s 2 ); therefore, the data are multiples of the gravitational acceleration.

Data Acquisition and Preprocessing.
In the experiments, 10 subjects' data are collected, 5 females and 5 males, wearing the wearable sensor each day for a period to collect data for six different activities of daily life, including sitting, standing, walking, running, going upstairs, and going downstairs. Figure 3 shows the example acceleration signals of sitting and walking for each axis of the triaxial accelerometer.
As the sensor will be influenced by the gravity, the data are filtered by a high-pass filter with a cut-off frequency of 0.5 Hz to eliminate the influence of the gravity.As the frequency of human daily activities is not too high, a low-pass filter with a cut-off frequency of 20 Hz is used to filter high frequency noise.In addition, the data are smoothed by median filter to eliminate independent noise.

Feature Extraction.
Features are extracted from the raw accelerometer data over the sliding window.In this paper, features were computed from 128 sampling points (2.56 s) with 64 samples overlapping between the consecutive sliding windows.Feature extraction on sliding windows with 50% overlap has been shown to be effective in previous studies on activity classification [4].
A number of features, including the mean, the standard deviation, the median, the correlation, the tilt angle (TA), the signal magnitude area (SMA), the frequency energy, the frequency entropy, and the wavelet energy, were extracted from the signals for activity recognition.A brief description of each feature is given as follows.
(1) Tilt Angle (TA).No matter what the state of the device is, there will be vertical gravity.When tilting the device, there will be a gravitational component in the -axis.The tilt angle  refers to the relative tilt of the body in space and is calculated from the acceleration in the -axis according to where   is the gravity component in the -axis and  is the gravity coefficient.
(2) Signal Magnitude Area (SMA).We adopt the SMA to extract a feature quantity according to where (), (), and () indicate the values of -axis, -axis, and -axis acceleration signals at the th sampling point after preprocessing and  is the length of the sliding window.SMA can indicate the fluctuation degree of the acceleration signal; the higher its value is, the more violent the fluctuation is.
(3) Wavelet Energy (WE).This paper uses db5 as the mother wavelet.By decomposing the vertical component to 5 layers, WE is calculated as the sum of the squared detail coefficients at levels 4 and 5, according to where CD  are the detail coefficients of -axis acceleration signal.
(4) Other Features.Mean, standard deviation, and median are the average, standard deviation, and median value of the signal over the sliding window, respectively.Correlation coefficient is calculated between the -axis and -axis of accelerometer signal.The frequency energy feature is calculated as the sum of the squared magnitudes of the discrete FFT components of the signal.Entropy is calculated as the normalized information entropy magnitudes of the discrete FFT components of the signal.Figure 4 shows some of these features for different activities.We can find that it is easy to distinguish some activities, such as sitting and standing, but the differences among walking, going upstairs, and going downstairs are not significant, likely to cause misrecognition.

Kernel Discriminant Analysis (KDA).
In order to improve the classification accuracy, further operations on the derived features are performed.
Linear discriminant analysis (LDA) is a supervised dimensionality reduction technique used for data analysis and pattern recognition.LDA tries to maximize the separation between different classes and minimize the separation within the same class simultaneously.KDA is a nonlinear extension of LDA to obtain nonlinear discriminating features by the kernel technique [26].In KDA, input data are mapped to the high dimensional feature space  by nonlinear feature mapping  :   → .We select Gaussian radial basis function (RBF) for .In KDA, the cross class and intraclass scatter matrices are computed as where   is the number of samples from the th class,  is the number of classes,    is the centroid of the th class and   is the global centroid,  is a vector for a specific class, and   is the set of samples of the th class.   represents the degree of scattering within classes of activities and is calculated as the summation of covariance matrices of each class, whereas    represents the degree of scattering between classes of activities and is calculated as the summation of the covariance matrix of the means of each class.The optimal discrimination transformation in the projection space is obtained by solving the following optimization problem: The optimization solution  opt , corresponding to the largest eigenvalues , can be explained by the generalized eigenvalue problem: It is proved that the above equation can be equivalently represented as The corresponding eigenvalue problem is represented as where  is the kernel matrix with its element defined as where (⋅, ⋅) is a positive semidefinite kernel function and  is a matrix with its element defined as More details on KDA can be found in [26].

ELM Algorithm.
Extreme learning machine (ELM) was introduced by Huang et al. [11], originally proposed for standard single-hidden layer feed-forward neural networks (SLFNs), with random hidden nodes, and has recently been extended to kernel learning as well [27].Compared with traditional training methods, ELM has the advantages of high accuracy, fast learning speed, and good generalization property.It can provide a unified learning platform with widespread type of feature mappings and can be applied in regression and multiclass classification applications directly.Figure 5 is the structure of ELM.The network consists of an input layer, a hidden layer, and an output layer.For  arbitrary distinct samples (  ,   ), denote the input signal vector   = [ 1 ,  2 , . . .,   ]  ∈   and the output signal vector   = [ 1 ,  2 , . . .,   ]  ∈   , and   = [ 1 ,  2 , . . .,   ]  is the weight vector connecting the th hidden node and the input nodes,   = [ 1 ,  2 , . . .,   ]  is the weight vector connecting the th hidden node and the output nodes, and   is the threshold of the th hidden node.
The activation function of hidden layer neurons is (), and the hidden layer has  neurons.There exist   ,   , and   such that Denote the output of the network as x i w j b j  j t i

Input layer
Hidden layer Output layer Then (11) can be rewritten as where  is the hidden layer output matrix of the ELM: According to the theorem proven in [11], when the activation function is infinitely differentiable, and  ≤ , the parameters of SLFNs do not need all to be adjusted.Parameters  and  can be selected randomly before training and remain constant during the training process.To train SLFNs is simply equivalent to finding a least-square solution β of the linear system  = ; that is, Then the optimal output weights can be calculated as where  † is the Moore-Penrose generalized inverse of the matrix .

Approach Implementation and Experimental Results
In this section, we will implement the approach for extracting features of the data, using KDA to map the features to the high dimensional feature space, applying ELM algorithm to classify the samples and comparing ELM with other classic classifiers.

Recognition Using KDA on Original
Features.The difference of the features among the three activities (walking, going upstairs, and going downstairs) is not so significant.Therefore, KDA is implemented to deal with the problem.The 3D plot in Figure 6 shows the low intraclass variance and high cross class variance for the three activities.Figure 6(a) shows the three features, SMA, TA, and WE, extracted from the collected samples.It is difficult to classify them.Figure 6(b) shows significantly improved discrimination among the three activities after the KDA operation.

Classification Results.
In this subsection, we will compare the classification performance among ELM, BP, and SVM classifiers.All the experiments for the algorithms are carried out in MATLAB 8.0 environment running in an Inter i5, 2.6 GHz CPU.There are many variants of BP and SVM algorithms.Levenberg Marquardt BP (LM-BP) and leastsquare SVM (LS-SVM) [28] are used in this paper.
The data are randomly divided into two sets, namely, the training set and the testing set.The classification process consists of two parts, generating the model by the training set and testing the performance of the model by the testing set.In our experiments, all the inputs (attributes) have been normalized into the range [0, 1].Table 1 shows the confusion matrices between the 6 classes of activities (including sitting, standing, walking, running, going upstairs, and going downstairs) for the algorithms.The row represents the actual class and the column represents the recognized class by the algorithms, and the element at the th row and th column of the confusion matrix represents the probability of the actual class  is recognized as class  by an algorithm.The probability was calculated by 1000 samples over the window of six activities.From the matrices, we can find that ELM can distinguish the 6 classes of activities better than LM-BP and LS-SVM in general, and the KDA implementation on original features can improve the classification performance.
Furthermore, 50 trials have been conducted for a thorough comparison study and the average results are outputted as the classification results as shown in Table 2.The classifier of KDA based ELM takes the shortest testing time (0.0012 s) and achieves the highest testing accuracy (99.81%).Although KDA based ELM takes slight more training time than the original ELM, it has better classification result than the original ELM.That is because KDA can achieve the low intraclass variance and high cross class variance for activities and improve the classification accuracy effectively.
As shown in Table 2, KDA based LM-BP can classify samples faster than the original LM-BP and has higher classification accuracy.Also KDA based LS-SVM can improve the accuracy.We can find that the KDA strategy is useful in improving the recognition performance.Now we give the comparison among ELM, BP, and SVM classifiers.It can be found from Table 2 that the classification accuracy of ELM, LM-BP, and LS-SVM based on the original features is not significantly different, but ELM has much faster learning speed (up to hundreds times) than LM-BP and LS-SVM.Based on the KDA implementation, ELM and LS-SVM can achieve better classification performance than LM-BP, but LS-SVM needs longer time for training than ELM.Also, the parameters of LS-SVM and LM-BP need to be determined before training, while ELM selects parameters randomly before training.Overall, the experiments demonstrate that ELM can achieve superior recognition performance compared with the LM-BP and LS-SVM classifiers.

Conclusion
This paper develops a highly efficient approach for human activity recognition based on ELM and using only one triaxial accelerometer.A number of features, respectively, in the time domain, the frequency domain, and the time-frequency domain, are defined and extracted from the raw measurement signals.KDA is performed on the original features to achieve the low intraclass variance and high cross class variance for activities.ELM classifier is proposed to classify the activities.Experimental results show that KDA based classifier can improve the classification accuracy effectively and ELM can achieve superior recognition performance compared with SVM and BP classifiers.Further research is required to determine the most appropriate feature set for more specific subject groups, such as the elderly or the neurologically impaired.In the future, we would also like to examine more complex physical activities and apply the ELM algorithm to solve more complex classification problems.

Figure 1 :
Figure 1: The architecture of the proposed activity recognition approach.

Figure 2 :
Figure 2: Wearable sensor being worn by a subject.

Figure 4 :
Figure 4: Difference in feature values for discriminating different activities.

Figure 5 :
Figure 5: The structure of ELM.
3D feature space representation for KDA implementation on original features

Figure 6 :
Figure 6: Features without and with KDA operations.

Table 1 :
The confusion matrices between the 6 classes of activities.R indicates running, ST indicates sitting, SD indicates standing, W indicates walking, U indicates going upstairs, and D indicates going downstairs.