Energy-Efficient Real-Time Human Activity Recognition on Smart Mobile Devices

Nowadays, human activity recognition (HAR) plays an important role inwellness-care and context-aware systems.Human activities can be recognized in real-time by using sensory data collected from various sensors built in smart mobile devices. Recent studies have focused on HAR that is solely based on triaxial accelerometers, which is the most energy-efficient approach. However, such HAR approaches are still energy-inefficient because the accelerometer is required to run without stopping so that the physical activity of a user can be recognized in real-time. In this paper, we propose a novel approach for HAR process that controls the activity recognition duration for energy-efficient HAR.We investigated the impact of varying the acceleration-sampling frequency and window size for HAR by using the variable activity recognition duration (VARD) strategy. We implemented our approach by using an Android platform and evaluated its performance in terms of energy efficiency and accuracy. The experimental results showed that our approach reduced energy consumption by a minimum of about 44.23% and maximum of about 78.85% compared to conventional HAR without sacrificing accuracy.


Introduction
Interest in u-health and wellness-care has recently been growing [1][2][3].Various technologies that recognize the physical activities of users using various embedded sensors in smart mobile devices are actively studied.Recognized physical human activities can be used to develop applications that predict a falling accident or measure calorie consumption [4][5][6][7].Such applications mainly use the triaxial accelerometer because it consumes the least power compared to other available sensors [8,9].Therefore, the use of "sensor" hereafter in this paper refers to a triaxial accelerometer.
These applications need the accelerometer to operate continuously without stopping in order to recognize different physical human activities in real-time.Unfortunately, this incurs unnecessary power consumption by the sensor and computational overhead; it is regarded as a big problem considering the limited power resources of smart mobile devices [10][11][12].For example, while the battery life of LG Optimus Pro reaches up to over 60 hours when all applications and sensors are turned off, it decreases to 22 hours when a human activity recognition (HAR) application is activated with a sensor (100 Hz).
One facile solution is to blindly limit the usage of the accelerometer, but this may cause another problem of sacrificing the accuracy of human activity recognition.Another solution is to adopt a lower acceleration-sampling frequency (SF) for the sensor, but this may result in the loss of important sampling data.For this reason, previous studies have mostly focused on achieving a rather suboptimal balance between energy efficiency and HAR accuracy, instead of seeking optimal power consumption without sacrificing the HAR accuracy [13][14][15][16].An analysis of the previous studies showed that they required the accelerometer to be operating at all times; as a result, the power consumption due to the continuous operation of the sensor itself and the accompanying data processing by the CPU remain unaddressed.In this paper, we argue that it is possible to save energy to great extent without continuous sensor operation.
In order to further improve the energy efficiency, we propose an approach that dynamically controls the variable activity recognition duration (VARD) for HAR.Our approach 2 Mobile Information Systems classifies a user's activities as dynamic or static and controls the classification duration and sleep time for the HAR process based on two factors: the acceleration-sampling frequency and window size (WS).We performed experiments and conducted a thorough analysis of the result to show that the proposed VARD strategy performs well in terms of both energy efficiency and HAR accuracy.
The remainder of the paper is organized as follows.Section 2 presents an analysis of previous HAR approaches for efficient power consumption.Section 3 describes our initial motivations and a basic HAR system.Section 4 presents the impact of varying the SF, WS, and feature vector dimensionality (FVD) on the classification accuracy and the power consumption.Section 5 explains the VARD strategy.Section 6 reports on the evaluation results for our approach.Finally, Section 7 concludes with a summary and future directions.

Related Works
In this section, we first present a variety of accelerometerbased HAR technologies and then discuss relevant previous studies.

Human Activity Recognition Using Accelerometer.
Earlystage researchers investigated the wearable sensor-based HAR; they demonstrated that the usage of wearable sensors can provide elevated accuracy in the area of HAR [17][18][19][20].Recent wearable sensor-based HAR has been enhanced by some previous work.Hong et al. [21] presented a personalized HAR system using Bayesian network and support vector machine (SVM).
Due to the rapid advancement of smart mobile devices technology, many researchers focused on the mobile devicebased HAR.Their work [4,[22][23][24] also turned out to be successful in providing high recognition rate.Torres-Huitzil and Nuno-Maganda [25] showed a position-independent HAR system using time-domain features and neural network.Vo et al. [13] presented a personalized HAR system through SVM, along with a -medoids clustering method.Albert et al.
[2] studied a HAR system for Parkinson's patients.
Smart mobile devices are promising platform for HAR because they not only are equipped with embedded builtin sensors but also are a natural part of everyday human daily life [26].However, smart mobile device needs energy management due to its limited resources.

Human Activity Recognition with the Energy-Saving.
A naive solution to reducing the power consumption of mobile devices is to limit the usage of the accelerometer.However, such an approach may negatively affect the HAR accuracy and therefore should be applied with caution.
Vo et al. [13] aimed to reduce the power consumption of the accelerometer and CPU by improving the HAR algorithm.Their approach relied on a SVM and time-domain features and reduced the power consumption by about 6.7% when compared to a conventional approach adopting SVM and fast Fourier transform (FFT).However, they focused more on the HAR accuracy than on reducing the power consumption.
Vo et al. [14] and Yan et al. [15] improved the power consumption efficiency by changing the SFs of the accelerometer and classification features.The key concept was identifying the best combination of SF and classification feature for a specific activity.Their approach reduced the power consumption by about 20%-25% compared to previous approaches.However, their approach also requires continuous operation of the triaxial accelerometer when the application is running.
Liang et al. [16] reduced the power consumption of HAR by using lower SFs.They proposed a hierarchical recognition algorithm that uses time-domain features, frequencydomain features, and similarity measurements.Their algorithm applies a decision tree instead of SVM.In their results, the battery life was extended by 3.2 h.However, because this algorithm tried to use a lower SF, the HAR accuracy was at best over 85%, which is less than that of other studies [13][14][15].
In this paper, we propose a new approach for HAR process that reflects the physical states of the mobile user.Our approach can secure a similar or higher HAR accuracy compared to previous approaches while providing better energy efficiency.

Human Activity Recognition on Smart Mobile Devices
In this study, our aim was to develop a lightweight HAR approach that uses the embedded accelerometer in smart mobile devices.To build a mobile HAR system on smart mobile devices, methods for sensor monitoring and real-time detection of user activity need to be considered, as depicted in Figure 1.
Typical HAR can simply be defined as the process of interpreting raw sensor data to classify a set of physical human activities [27].Statistical machine learning techniques are used to infer information about the activities from raw sensor readings; this process usually includes a training phase and predicting phase.The training phase requires collecting labeled data to learn the model parameters and build a training model from the collection.The predicting phase uses the training model to classify physical activities of users in the following sequence: preprocessing, segmentation, feature extraction, and classification.The following subsections explain the details of the proposed HAR process.

Collecting Acceleration. Physical human activities consist
of basic movements such as walking, sitting, standing, and running.We selected the six most common activities as target activities, which have been recognized in previous works [8, 13-16, 19, 24].Table 1 presents the target activities for our study.
We collected data from the triaxial accelerometer (MPU-6050; maximum range: 39.227 m/s 2 ; resolution: 0.001 m/s 2 ) on the LG Optimus Pro (Android Kitkat 4.4.2OS) of two male subjects who are 28 and 32 years old, respectively.A smart mobile device was placed inside the back pocket of the pants of a subject.With the Android operating system, four different SFs (NORMAL: 5 Hz, UI: 16 Hz, GAME: 50 Hz, and FASTEST) can be selected for the accelerometer.The FASTEST SF depends on the computational workload of each specific mobile device and thus can differ from device to device.For our device, the FASTEST frequency was 100 Hz.In this study, we collected training data for six activities from two subjects.For each activity, 30 samples were collected at four different SFs; thus, we collected 1440 samples in total.A sample was a unit with a single activity classification and corresponded to a window that contained the preset number of contiguous accelerometer data, which we called the WS.Section 4 discusses the experiments performed with the above samples.Figure 2 illustrates an example of the acceleration signals of human activities on each axis.This example was obtained at an SF of 100 Hz and WS of 128.

Preprocessing Data.
The preprocessing step consists of segmentation, the total magnitude (TM), and normalization.In the segmentation phase, the raw accelerometer data are segmented into windows with size , where /2 accelerometer samples overlap between two consecutive windows.Feature extraction has been successfully performed on windows with 50% overlap in previous work [17].The TM is the intensity (vibration) of a user activity and is a significant metric for discriminating between activities [8,16,24].The TM is   Finally, the raw data and TM data are normalized to have values in range of (−1, 1) for later feature extraction and classification [28].

Extracting Features.
The selection of proper features from raw data plays an important role in the HAR performance.In general, the relevant features extracted for HAR are grouped into three categories: (i) time-domain features such as the mean, standard deviation, energy, and correlation between axes [13][14][15][16][17]23]; (ii) frequency-domain features such as the FFT coefficient, zero crossing rate, and autocorrelation of the magnitude [13-17, 29, 30]; and (iii) other features such as wavelet features [16,29], the autoregressive coefficient [31], and discrete cosine transform coefficients [32].
The FFT coefficient demonstrates a higher average accuracy than the rest of the features [16,31].Thus, the first 20 FFT coefficients (first five for each of the three axes and five from TM; see Sections 4.1 and 4.2) are selected for each window, as illustrated in Figure 4.The FFT coefficients on each axis reflect the amplitude of basic waves which can be combined to reconstruct the original signal.For FFT, we utilized the decimation-in-time (DIT) Radix-2 FFT [33], which recursively partitions a discrete Fourier transform (DFT) into two half-length DFTs of the even-and oddindexed time samples.

Classifying Activities.
The extracted feature vectors can be classified by using the SVM classifier, which is widely used for HAR [13][14][15]23].LibSVM [34] was adopted to classify the dataset.SVM is a learning algorithm that separates training samples into their corresponding classes by maximizing the margin of a separating hyperplane between classes in order to solve the classification problem.SVM efficiently finds the complex hyperplane in nonlinear data by using the kernel trick.We used the radial basis function (RBF) kernel in order to map support vectors to multiple dimensions because there were 20 FFT attributes [23].
Human activities were classified into two activity types, as given in Table 1: (i) the static activity type (SAT) includes "sitting" and "standing" and (ii) the dynamic activity type   (DAT) includes "walking," "running," "ascending stairs," and "descending stairs."The SAT is equivalent to a nonmoving relaxed state, and the DAT denotes active movement.Our strategy exploits the fact that humans are likely to maintain the same activity type for some time, especially for the SAT.

Tradeoff between Energy and Accuracy
The effects of the SF, WS, and FVD on the classification accuracy and power consumption were evaluated, and the FVD and combination of SF and WS were identified for application to our method.To obtain the readings, we turned off the network interfaces and display of our mobile device during the experiment.We used PowerTutor [35] utility to measure the power consumption.

Classification Accuracy and Acceleration-Sampling
Frequency.We investigated the impact of different SFs on the classification accuracy with a WS of 128 and FVD of 20.Here, 2400 test samples were used (six activities × four SFs × 100 samples).As shown in Figure 5, high SFs normally produced better predictions, especially for the DAT cases.The SFs of 50 and 100 Hz recorded an average accuracy of 90% or more in six activities and were sufficiently higher than the minimum SF of 20 Hz that is required to assess daily activities [36].

Classification Accuracy and Feature Vector Dimensionality
with Differing Window Sizes. Figure 6 illustrates how the classification accuracy changed with the number of coefficients for each WS.Using the first 20 FFT coefficients (first five for each of the three axes and five from TM) produced an accuracy of more than 90% for a WS of 128 or more.Our experiments showed a slightly different result compared to Preece et al. [29], who analyzed the discriminative ability of individual FFT coefficients.They found that applying the first 18 coefficients (first six on each of the three axes) produced  the maximal accuracy.This discrepancy may be due to our incorporation of TM coefficients in our feature vectors.

Power Consumption and Feature Vector Dimensionality.
For this experiment, we set the SF and WS to 100 Hz and 128, respectively.The SF of 100 Hz had the best classification accuracy, as shown in Figure 5, and the WS of 128 had a prediction accuracy of over 90%, as shown in Figure 6. Figure 7 plots the power consumption over 30 min against different numbers of FFT coefficients.The power consumption showed a quadratic increase with the dimensionality.Based on the results shown in Figures 3 and 4, we selected an FVD of 20 in our study.This had the least power consumption among FVDs with an accuracy of more than 90%.A high frequency mandates more frequent raw data collection.(ii) Larger WSs normally consume less power because they decrease the number of classifications, which take up a large proportion of the power consumption.

Power Consumption and Acceleration-Sampling Frequency with Differing Window Sizes.
Table 2 summarizes our investigations.We adopted SFs (50 Hz and 100 Hz), WSs (128, 256, and 512), and an FVD (20) which yielded an accuracy of 90% or more with low power consumption.

Experiments on the Variable Activity Recognition Duration Strategy
To monitor user activities on smart mobile devices in an energy-efficient manner, our study focused on two key ideas.First, humans more often tend to maintain the same activity than change from one activity to another (e.g., walkto-run and sit-to-stand).When one activity is recognized in succession, we assumed that the activity will be lasted for a while.Therefore, we focused on developing an energysaving scheme that increases the classification duration this situation.If we increase the period in which an activity is recognized in a given time, the frequency of activity recognition will decrease.Consequently, this reduces the power consumption necessary for activity recognition.To increase the classification duration, we adopted a method that lowers the SF and/or increases the WS.We verified that a low SF and large WS consume less power, as shown in Figure 8.
Second, dynamic activity (e.g., walking and running) is more meaningful than static activity (e.g., sitting and standing) equivalent to a nonmoving relaxed state because it can be used as data for dynamic health information such as calorie consumption.Thus, we first classified a user's activities as a DAT and SAT, as indicated in Table 1.And then, when an SAT is recognized, we gave a break to the HAR process in order to save more energy.
Based on these ideas, we applied different strategies for each type with regard to the classification duration.To control the duration, the SF and WS were used for the DAT, and a sleep time was additionally used for the SAT.We call this energy-saving scheme the variable activity recognition duration (VARD) strategy.

Variable Activity Recognition Duration Strategy for the Dynamic Activity Type.
To increase the classification duration, we can lower the SF and/or increase the WS.However, a low SF and large WS are insensitive to rapidly changing activities because they yield fewer samples than a high SF and small WS.Therefore, our strategy is to start with a high SF and small WS to quickly identify changing activities.If the same dynamic activity is maintained for a long time, we assume that the same activity will continue and adopt a method to lower the SF and increase the WS.
To guarantee the energy efficiency and high accuracy of HAR, we can choose SFs of 50 and 100 Hz, as shown in Figure 5, and WSs of 128, 256, and 512, as shown in Figure 6.Each SF and WS can be combined for a total of six combinations.The classification durations of ⟨100 Hz, 256⟩ and ⟨50 Hz, 128⟩ are the same at 2.56 s.
However, the power consumption of ⟨50 Hz, 128⟩ (573 mWh) is less than that of ⟨100 Hz, 256⟩ (832 mWh), as shown in Figure 8. Another difference between the two combinations is that the larger WS provides better HAR accuracy because it extracts more precise features in the raw data with noise comprising the latter part of previous acceleration from the changing activity, as shown in Figure 9.These two differences have conflicting tendencies for the energy efficiency and HAR accuracy.If the classification durations overlap, we can choose the energy-efficient combination to focus on saving energy.
Accordingly, we adopted four combinations for the strategy with the DAT, as listed in Table 3: ⟨100 Hz, 128⟩, ⟨50 Hz, 128⟩, ⟨50 Hz, 256⟩, and ⟨50 Hz, 512⟩.We used the repeating count of the same activity in order to check that the same activity is continuous.A threshold for this count was set, and we implemented a strategy of changing from the current combination to the next combination with a low frequency and large WS if the count carries over the threshold.The progression to each configuration away from the first combination causes the improvement in energy efficiency and marginal weakening of the HAR accuracy.

Variable Activity Recognition Duration Strategy for the
Static Activity Type.Our strategy for the SAT is based on a similar concept for the DAT strategy.However, there is no need to recognize SAT often because there is less movement compared with DAT.Our SAT strategy, therefore, uses the sleep time during the HAR process along with the SF and WS for better energy efficiency compared to the DAT strategy.In addition, a DAT should be stably recognized in the SAT state because it is more important than the SAT for extracting processed information.
In our strategy, when an SAT is recognized during the classification of human activity, the process takes a break.After the break, the human activity is reclassified.As a result, the classification duration increases within a given time because this strategy incorporates a sleep time.
To ensure stable HAR accuracy while reducing energy consumption, this strategy involves Sleeping 0 s when an SAT is initially recognized and gradually increasing the sleep time in increments of 1 s whenever an SAT is continuously recognized.
The power consumption can be reduced with a break.Nevertheless, the extent to which the break can be increased while ensuring stable HAR accuracy needed to be evaluated.Therefore, we investigated the HAR accuracy with six combinations: ⟨100 Hz, 128⟩, ⟨50 Hz, 128⟩, ⟨100 Hz, 256⟩, ⟨50 Hz, 256⟩, ⟨100 Hz, 512⟩, and ⟨50 Hz, 512⟩.This was done  in order to calculate the preferred maximum sleep time.In this experiment, the HAR accuracy was observed as the break was increased from 0 s to 60 s for each combination.The observation times for each break were 5 and 10 min.We made a total of 732 observations (6 × 61 × 2) of the HAR accuracy.Figure 10 plots the observed HAR accuracy.This experiment showed that the HAR accuracy became unstable every time the break was over a certain amount.The circular symbols in Figure 10 show the break after which the HAR accuracy badly fluctuated.This point was the limit to the break for each combination.The limit can be calculated by where   is an SF,  is a WS, and  is a constant of 30 s as determined in this experiment.Based on this limit, we can guarantee efficient power consumption and stable accuracy during HAR. Figure 11 plots the measured power consumption of six combinations in 2 h with a preset maximum sleep time: ⟨100 Hz, 128⟩, ⟨50 Hz, 128⟩, ⟨100 Hz, 256⟩, ⟨50 Hz, 256⟩, ⟨100 Hz, 512⟩, and ⟨50 Hz, 512⟩.The power consumption increased with a larger WS relative to a small WS, and changes to the SF had less effect on the power consumption than changes to the WS.This is because the numbers of activity recognition processes for every combination within a given time are equal if the HAR process has a sleep time, and a large WS increases the computational cost of HAR.As a result, the samples with a large WS consumed more power.Therefore, using a small WS can ensure high energy efficiency.
As shown in Figure 10, however, the average accuracy is higher for large WSs than small WSs.Thus, we adopted three combinations for the SAT strategy: ⟨100 Hz, 512⟩, ⟨100 Hz, 256⟩, and ⟨100 Hz, 128⟩.As indicated in Table 4, we defined the VARD combination configuration for SAT strategy.We used the repeating count of SAT in order to check that the type is continuous and employed a strategy of changing from the current combination to the next combination with a smaller WS if the count carried over a threshold based on the sleep time limit.Progressing to further configurations away from the first combination increases the energy efficiency and destabilizes the HAR accuracy.3 and 4.These models are built by an offline SVM using the training samples discussed in Section 3.1.
By classifying a recognized activity as a DAT or SAT, the HAR process transfers from the Sensing State to the state for each type.For a DAT, the HAR process goes into the Dynamic State to perform the DAT strategy.Otherwise, the SAT strategy is performed for the Static State.When the SAT strategy is performed, the HAR process transfers to the Sleeping State unconditionally and takes a break.This break time is set by the repeating count of SAT.After the break, the process returns to the Sensing State in order to reclassify the human activity.
When an event listener for the triaxial accelerometer is registered in the initial Idle State, the HAR process transfers to the Active State.The Active State comprises two substate machines: the Sensing State and Sleeping State.In the Active State, the process initializes the SF and WS and loads the classification model for this combination.It also sets a threshold for the repeating count of the same activity.The HAR process transfers to the Sensing State after the accelerometer is started.While this transition is performed, the repeating count of the SAT and repeating count of the same activity are initialized with zero.When all of the initializations are completed, the Sensing State begins so that a human activity can be recognized.This portion is equivalent to lines (1)- (10) in Algorithm 1.
When a recognized activity is a DAT, the HAR process is transferred to the Dynamic State.In this state, the repeating count of the same activity and maximum sleep time are initialized.In the Dynamic State, the current activity is checked to see if it is equivalent to the previous activity.If they are the same, the repeating count of the same activity is increased.If this count exceeds the threshold, then the current VARD configuration is changed to the next combination, and the count is initialized.If the current and previous activities are not the same, the repeating count of the same activity is initialized, and the VARD configuration is changed to the first  SVM case.VARD with SAT represented the minimum power consumption and consumed 3% more power than with no HAR.Finally, VARD with daily activities showed a reduction of 36% in energy consumption compared to typical SVM.The increase in energy efficiency compared to typical SVM was computed by (power (Typical SVM) − power (type))/power (Typical SVM).The increase in efficiency was about 44.23% for VARD with dynamic activity only, about 78.85% for VARD with static activity only, and about 69.23% for VARD with daily activities.

Human Activity Recognition Accuracy.
The confusion matrix in Table 5 represents HAR errors for a real dataset (six activities × 100 samples).The confusion matrix shows that 5% of "walking" was misclassified as "ascending stairs" and 6% for opposite misclassification.Also, 8% of "sitting" was misclassified as "standing" and 5% for the opposite misclassification.The experimental results showed that the average HAR accuracy was 92.17%.If the activities "sitting" and "standing" are unified into a relaxation activity, the HAR accuracy for an SAT would be 99.5%.

Conclusions
Conventional HAR using the built-in accelerometer in smart mobile devices still has high power consumption due to not only the sensor itself but also the accompanying CPU computation overhead.Inspired by such challenge, we presented a new approach for energy-efficient real-time HAR on smart mobile devices.The experimental results showed that our method can achieve greater than 64% average energysaving as compared to conventional HAR (SVM).We also showed that the average HAR accuracy was about 92% with six different activities.Moreover, we reported on how the SF, WS, and FVD alter the battery power consumption behavior with HAR.This report may be helpful to the field of HAR.However, if the Sleeping State persists for a long time, sudden human activities such as a fall cannot be recognized properly.
In order to solve this problem, future work on improving the accuracy for recognizing sudden activity changes is needed.

Figure 1 :
Figure 1: Human activity recognition using a single accelerometer.

Figure 2 :
Figure 2: An example of acceleration signals of target activities on three axes.
Figure 3  plots the acceleration on each axis, and the TM data are a sample of the "walking" activity.

Figure 3 :
Figure 3: Normalized amplitude of the "walking" activity on -, -, and -axes and the total magnitude.

Figure 4 :
Figure 4: 3D acceleration and total magnitude after fast Fourier transform.

Figure 5 :
Figure 5: Accuracy across six activities with different accelerationsampling frequencies.

Figure 6 :
Figure 6: Accuracy versus feature vector dimensionality with different window sizes.

Figure 7 :
Figure 7: Power consumption of feature vector dimensionalities at an acceleration-sampling frequency of 100 Hz and window size of 128 over 30 min.

Figure 8
illustrates the power consumption for different SFs and WSs with an FVD of 20 over 2 h.The results can be summarized as follows:(i) The power consumption clearly increases with the SF.

Figure 8 :
Figure 8: Power consumption at different acceleration-sampling frequencies and window sizes over 2 h.

Figure 9 :
Figure 9: Comparison of feature extraction precision with noise.

Figure 10 :
Figure 10: Recognition accuracy of human activities with variable activity recognition duration.The circle of each combination represents the break where the human activity recognition accuracy violently fluctuates.

Figure 11 :
Figure 11: Power consumption of each of six combinations with a maximum sleep time of 2 h.

Figure 13 :
Figure 13: Comparing power consumption of different types.

Table 1 :
Classification of target activities.

Table 2 :
Summary of tradeoff between energy and accuracy.

Table 3 :
Variable activity recognition duration configuration for dynamic activity type.

Table 4 :
Variable activity recognition duration configuration for static activity type.

Table 5 :
Confusion matrix of human activity recognition with variable activity recognition duration.