A Mobile Application for Easy Design and Testing of Algorithms to Monitor Physical Activity in the Workplace

,


Introduction
The ubiquity of smartphones together with their ever increasing computing, networking, and sensing capabilities has changed the landscape of people's daily life.Among others, activity recognition, which takes the raw sensor readings as input and predicts a user's activity, has become an active research area in recent years [1][2][3].Activity recognition aims to understand the actions and goals of one or more humans, from a series of observations on their actions and the environmental conditions.Indeed, Human Activity Recognition (HAR) has become a task of great interest, especially for medical, military, and security applications.For instance, patients with diabetes, obesity, or heart disease are often requested to perform a well-defined physical training as a part of their treatment.Therefore, the ability to automatically recognize activities such as walking, running, or resting becomes a powerful tool, to encourage the patients and to provide feedback on their behavior to the caregivers.Application areas for HAR include [4] daily life monitoring [5][6][7][8], personal biometric signature [9], elderly and youth care [10][11][12], and localization [13,14].

Mobile Information Systems
A necessary prerequisite for systems aimed at stimulating physical activity (PA) is to have monitoring capabilities enabled by HAR.The importance of promoting PA among people, through virtual coaching, is motivated by recent research outcomes that correlate sedentary behaviors with "an elevated risk of diabetes, cardiovascular disease, and all-cause mortality" [15].Worsening of other health conditions, like metabolic syndrome, type-2 diabetes mellitus, and obesity, is also strongly associated with increased inactivity.Unfortunately, modern workplaces are typically populated by almost inactive adults who spend several hours sitting [16], and a 2-hour increase of this kind of "occupational" inactivity has been related to a 5-7% increase of the health risks highlighted above [17].Only a small part of the adult population (18 to 64 years old) in developed countries meets the Global Physical Activity (GPA) guidelines, recommending at least 150 mins of "moderate to vigorous" PA per week.Weightgaining, up to obesity, is another side effect of a lazy lifestyle: in addition to medical costs, it also causes relevant economic losses, due to missed working hours, decreased productivity, and disability [18].
According to the previous discussion, replacing the sitting time spent at the workplace with low-intensity PA may help preventing chronic diseases.Some exotic solutions have been proposed, such as workstations that allow the worker to stand or walk, using a specially designed standing or treadmill desk [19].Stimulating PA through a virtual coach may be a feasible solution, and, to this aim, a precise monitoring of daily activity in the workplace is an extremely important task.
This work presents a mobile application, called Actimonitor Android, developed as a tool for rapid and easy testing of algorithms designed to accurately monitor the daily activity in the workplace.The accelerometer sensor onboard mainstream smartphones is used, and the feasibility of implementing even complex HAR systems on a smartphone is demonstrated.The tool is first developed and tested in an offline learning phase.Afterwards, it is executed on a mobile platform.Typical smartphone-related constraints, such as available computational resources, memory, and battery power, raise specific challenges for high-demanding mobile applications, like HAR, that requires feature extraction, classification, and transmission of relevant amounts of raw data.Moreover, current open source machine learning (ML) application programming interfaces (APIs), such as the Waikato Environment for Knowledge Analysis (WEKA) [20] and Java Data Mining (JDM), are neither designed, nor optimized, to run with full functionality on mobile platforms.Thus, a relevant problem addressed in this work is the mobile implementation of a HAR system, meeting response time, and energy consumption requirements.
The paper is organized as follows: Section 2 introduces the HAR problem, discussing the role of sensors and the stateof-the-art algorithms for activity recognition.In Section 3 the datasets, tools, and methodologies used for experiments are presented, with the mobile application developed for HAR algorithms design.Experimental results are discussed in Section 4; finally, Section 5 concludes the paper.

Review of Literature.
A classic ML approach is adopted in HAR systems, in which classification is performed upon features extracted from raw sensor data, properly collected, preprocessed, and arranged into time-based segments.From data to features, an abstraction process takes place, based on which statistical or frequency-domain properties capture sensible information over each data segment, to feed a classifier.A selection of features may be necessary to reduce the data dimension handled by the classification algorithm that is designed on a training data subset and evaluated on a testing data subset.
Most of the research on HAR through mobile devices has been carried out using sensor data collected from smartphones but subsequently processed offline by means of ML toolboxes, such as WEKA [20].As previously mentioned, smartphones have been traditionally considered as devices with limited resources, in terms of computational processing and battery lifetime [21].While it is still important to consider these limitations when developing HAR systems for smartphones, such devices have become increasingly capable of running complex HAR in real-time.Nevertheless, challenges still remain in the evaluation of HAR solutions, particularly across the wide variety of hardware and software components now available.While a wide range of studies have reported and reviewed offline HAR (e.g., [22,23]), just a few ones have fully implemented HAR on mobile phones for real-time processing [24].This should include sensing, preprocessing, and classification, all carried out locally on the device.
Data provided by an accelerometer and a gyroscope onboard an Android smartphone carried in a pocket have been used by Dernbach et al. [25], to recognize simple actions (sitting, walking, running, and standing) and even more complex activities (cleaning, cooking, washing hands, and taking medication).Recognition of simple actions has been attained with a 93% accuracy, by a Multilayer Perceptron classifier and a two-second time window.The inclusion of complex activities in the dataset reduced accuracy to 50%, but still these results are promising for the current work, which seeks to identify simple physical activities such as walking, standing, and siting in an office environment, using data from a single smartphone sensor.Classification for this study was however carried out offline using the WEKA toolkit and therefore needs to be implemented and tested in real-time.In the paper by Kim et al. [26] a HAR solution was developed to assess physical activity and energy consumption in various buildings.This solution, developed for Android devices, recognized walking, climbing and descending stairs, running, and no movement.A Support Vector Machine (SVM) used data from accelerometer, gyroscope, and magnetometer to provide the classification, achieving high accuracies (98.26%).
Among the classifiers implemented and tested on mobile phones in the last few years, it is possible to mention Decision Trees (DT) [22,23], SVM [27], -Nearest Neighbor (-NN) [24], and naïve Bayes [28].Multilayer or hierarchical classification are obtained combining classifiers in different ways.Reddy et al. [29] combined a DT and a Dynamic Hidden Markov Model (DHMM), achieving an accuracy of 93.6% over a dataset from sixteen actors.In the majority of studies, classifiers are trained offline, using representative data, because training is computationally expensive and does not match real-time requirements.Then, the classification is implemented in real-time.Recently, Google released a realtime activity recognition API [30]; however, this is limited to motion-related activities (walking, cycling, and driving) and does not include static activities such as standing still or sitting, which are of interest in this work.
Real-time feedback to the user is another important aspect of both context aware and healthcare applications, particularly when trying to promote PA.However, in a large amount of studies this feature is missing [31].A system developed by Lane et al. [28] provided real-time feedback through an animated user interface, reflecting the user's behavior, which is a slow motion for a static condition, and a more dynamic one for an increased activity.In this work we aim to stimulate the subject's PA by prompting, based on PA self-monitoring through the Actimonitor Android app.
The following sections provide details of the HAR classification process discussing current practices within the literature.These include data collection, preprocessing, feature extraction, and classification steps.

Problem Definition.
Resorting to [32] and borrowing the same notation, the HAR problem (HARP) may be mathematically and formally defined, starting from sensor data collected and indexed over the time dimension and assuming nonsimultaneous activities: Definition 1 (HARP).Given a set  =  0 , . . .,  −1 of  time series, each one from a particular measured attribute and all defined within time interval  = [  ,   ], the goal is to find a temporal partition ⟨ 0 , . . .,  −1 ⟩ of , based on the data in , and a set of labels representing the activity performed during each interval   (e.g., sitting and walking).This implies that time intervals   are consecutive, nonempty, and nonoverlapping, such that ⋃ −1 =0   = .
The very large (or even infinite) amount of combinations of attribute values and activities and their generally unknown duration prevent a deterministic solution to the HARP and require the use of ML tools.A relaxed version of the problem is consequently introduced, in which time series are divided into fixed length time windows, as follows.
Definition 2 (relaxed HARP).Given a set  =  0 , . . .,  −1 of  time windows   having the same size and being totally or partially labeled, such that each   contains a set of time series   = { ,0 , . . .,  ,−1 } from each of the  measured attributes, and a set  = { 0 , . . .,  −1 } of activity labels, the goal is to find a mapping function  :   →  that can be evaluated for all possible values of   , such that (  ) is as similar as possible to the actual activity performed during   .
The relaxation introduces some errors into the model, which are however negligible for most applications.A relevant approach in activity recognition is to combine the output of different models to produce more accurate predictions.This leads to multiclassifier systems, which are shown to be effective, at the expense of an increase in computational complexity.The formal definition of combining predictions from several learners is as follows.
Definition 3 (HARP with multiclassifier).Given a classification problem with a feature space  ∈   and a set of classes Ω = { 0 , . . .,  −1 }, an instance  ∈  to be classified and a set of predictions  = { 0 , . . .,  −1 } for , from  classifiers, the goal of a multiclassifier system is to return the correct label Some of the challenges faced in activity recognition are common to other fields too, but there are several specific issues for which dedicated computational methods have been developed.The recognition of highly diverse human activities requires selecting and combining several heterogeneous sensors that can be dynamically added or removed, based on application-driven requirements.Suitable metrics are finally defined to evaluate the HAR system performance.Table 1 summarizes the options in HAR system design and implementation.

Activities.
Activities recognized from sensor data can be classified in different ways, for example, in terms of their complexity.A simple locomotion could be walking, jogging, walking downstairs, taking elevator, and so forth.Complex activities are usually related to a combination of actions (e.g., taking bus and driving) but may even correspond to the movements of certain body parts (e.g., typing and waving hand).Some activities may refer to the general context of healthcare, such as falling, exercise, and rehabilitations.Location-based activities include dining, shopping, and watching movies.Vision-based activities include leaving or entering a place.An IR sensor could detect a user moving or being still, whereas a home assisting robot could understand when the person is sleeping, taking pills, or doing cleaning [4,33,34].
Solutions developed for HAR must be robust to "intraclass" and "interclass" variability, the former occurring when the same activity is performed differently by different Measures gravity force applied to the device, along three axes (; ; ) Rotation Measures the orientation of a device by providing the 3 elements of the devices' rotation vector individuals, or even by the same one in different times, and the latter due to data showing very similar characteristics even if belonging to fundamentally different classes.When not all the data in a continuous stream are relevant for HAR, the so-called NULL class problem may occur, which is difficult to model, as it represents a theoretically infinite space of arbitrary activities.A taxonomy of the most common activities targeted by HAR systems is summarized in Table 2.

Sensors, Data
Preprocessing, and Segmentation.Sensors are the source for raw data collection in activity recognition, and they may be classified into three categories: video, environmental, and wearable sensors.
Wearable sensors are small size mobile devices designed to be worn on human body in daily activities.They can record users' physiological states, such as location changes, moving directions, and speed.Many wearable sensors are available on-board smartphones: Table 3 summarizes real (hardware) and virtual (software) sensors that are provided in current mainstream mobile devices [4,34,35].
Due to the intrinsic characteristics of accelerometers, the sensor orientation and the way the device is carried by the subject may heavily affect the raw data value.The most common positions of worn sensors used in the literature are hand-held, on the belt, in the pants pocket, and on the pelvic area.Sensitivity to orientation may be addressed by adding another sensor, through an aggregation technique.
Raw data collected from sensors are preprocessed, to reduce the effects of noise by means of filtering methods, like average smoothing.Additionally, preprocessing enables data synchronization when samples arrive from multiple sensors, or artifacts removal.
Preprocessing of wearable sensors signals like acceleration may involve calibration, unit conversion, normalization, resampling, synchronization, or signal-level fusion [36].Data segmentation allows identifying segments of the preprocessed data streams that are likely to contain information about activities (activity detection or spotting).This is usually a critical step in HAR, due to the intrinsic complexity of separating and identifying activities that humans typically perform with no separation in time.

Features and State-of-the-Art Algorithms.
Activity recognition relies on processing features that are extracted and selected from signals, through proper operations like conversion or transformation, to and from different domains.Feature computation may be automatic or derived from expert knowledge.A "feature space" is composed of the total number of features extracted from the data.If the extraction has been well executed, features corresponding to the same activity should appear clearly clustered in the space, and they should be clearly separated, if pertaining to different activities.Similarly, selected features are good if they are robust to intraclass variability and to different subjects performing the same activity.A wide range of features have been identified in the literature, according to the data type from which they are extracted.Among them, it is possible to mention the following: signal-based, body-model, eventbased, and multilevel features.Another classification of the features is based on the domain to which the inspected data pertain, as detailed in Table 4. Boosting and bagging [56,57] In order to limit the computational complexity of the classification process and the amount of training data needed for parameter estimations, the feature space dimensionality should be kept at the minimum, by identifying the core set of features that still allows targeting the desired performance.This reduces also the memory and bandwidth requirements for real-time processing on embedded systems.
Once the most effective features are extracted, a fundamental part of a HAR system is the algorithm needed to classify new instances of recorded data [37].The algorithm that outputs the classification label is represented by a model that has to be trained.Several inference methods have been proposed in ML and computational statistics, as listed in Table 5.In supervised ML algorithms, a function is inferred from a set of ground truth-labeled training examples, with the aim of minimizing the classification error and being able to map new examples (the testing ones).

Datasets.
The lab and real-world recorded datasets, upon which this work is based, are taken from [38], used by Kwapisz et al. in [39], which contains the accelerometer data recorded with Android smartphones placed in the pants pocket, at an average sampling frequency of 20 Hz.This dataset is actually divided into two parts: a smaller one recorded in a controlled laboratory environment and a bigger one recorded and labeled by the users, in real-world settings.
The lab dataset was recorded using three types of smartphones: Nexus One, HTC Hero, and Motorola Backflip.The 36 volunteers performed a specific set of activities (walking, jogging, ascending and descending stairs, sitting, and standing for given periods of time) while carrying an Android smartphone in their front pants leg pocket.The total length in time of the dataset recordings is ≈15 h, corresponding to an average of ≈25 min recording for each user.
The real-world dataset was recorded and labeled freely by the users during their everyday life and without a specific protocol.For this dataset there is also some demographics information about almost all the 563 users (372 males and 191 females) involved in the test; they are summarized in Figures 1, 2, and 3.Among the users, 67 declared to have an injury affecting the way they walk.The total length in time of this dataset is ≈42 h.In this dataset, the upstairs and downstairs activities are grouped into Stairs, and a new activity is introduced, that is, lying down Since both datasets are recorded with a smartphone, the sampling frequency is not regular and can vary during the recordings.The main problem is that the classifier, to express its best performances, has to work on features extracted from time windows with the same sampling frequency.Some of them can vary too much when the sampling frequency is  different, even if calculated for the same activity.For this reason all the time windows with a mean sampling frequency of ±2 Hz around the target value have been discarded.
Another aspect to consider is the windowing technique: since all the raw data are written sequentially on a text file, it is important to split them correctly, because the extracted time windows have to belong to the same user and the same activity.Since overlapping has not been used, the designed solution was to incrementally fill in the window during the file reading operation (line by line, since they are ordered in time) and truncate the window when an activity or user change is detected.All the windows with less than 90% of the target samples have been discarded.Table 6 summarizes the configuration used to process the datasets.The calculated features are then written on a file ready to be used with the WEKA toolkit.Figures 4 and 5 summarize the class distribution of the processed datasets, once the noncompliant time windows have been discarded.

Training and Test Methods.
To train and test different classifiers, a common procedure has been used.Each classifier was selected through the Classify tab of the WEKA explorer, and a standard 10-fold cross-validation has been used to obtain more reliable results.
No particular instance filtering techniques have been used, in fact all the windows for a certain activity have been used.The classifiers have been evaluated for the performances, the training time, the generated model interpretability, and the model file size.This last feature is relevant to the aim of implementing the classifier in a mobile application.In fact, once trained, the model has to be serialized, deserialized, and stored in the smartphone app.In particular, the performances have been evaluated via the most common indexes used in ML-precision, recall, and F-score-but also by analysing the obtained confusion matrices, to investigate possible classification problems and algorithm shortcomings.

Implementation of the Mobile App for the Rapid Design of Physical Activity Monitoring Algorithms in the Workplace.
The main use cases for the developed Actimonitor Android application are tracking the user's PA and allowing recording and collecting new data, with the aim of creating and populating a new and more complete dataset, with respect to the one retrieved from the literature [38].The mobile application is just a part of a bigger intended system, designed to collect data from many users, so it has to allow the management of a user profile too (setting password, email, and login).Moreover, the app is intended to be used also for synchronizing the data and collected and stored in the smartphone, to an online remote web server.

Data Acquisition and Manipulation.
Classes dealing with data acquisition and manipulation are the most important ones, because they are responsible for retrieving data from sensors, calculating the features, classifying the instance, and writing the final results on the application internal database and on files.The main class that manages all the operations is the BackgroundSensorsService one, whose structure is shown in Figure 6 that extends a regular Android Service class.The class is created and started once the user activates the recording, and it runs in background performing the sensors data collection and activity tracking, even once the app is closed.The class can be started or stopped only through the app.It also performs the loading and deserialization of the WEKA trained model.
The tracking is active only when the screen is off and the proximity sensor detects a near object (i.e., the smartphone is recognized as being in the pocket).A separate thread is in charge of writing the collected data in files (both raw and processed data), managing the creation of the directories where the files will be stored, and updating the smartphone file system.
Performance of the app is an important requirement, in particular with respect to the real-time constraint.The problem is solved by splitting and running the processes in different threads; this results in a larger real-time margin and does not affect the smartphone stability.[40].The app allows the user to access functions from a top and a side selection menu (Figure 7).Clicking on an item in one of the two menus    starts a new interface for the desired function or visualization.

User Interface. The app user interface is developed following the principle of Material Design
In particular, in the current implementation, the side menu is used to access the settings of the app and the top menu provides all the actual functions.The different interfaces are implemented through independent Android fragments that replace each other.
The app is capable of classifying and tracking the activity of the user, using a previously trained classifier.The classifier is trained for the case of a user placing the smartphone in the trousers pocket, with the upper part of the device directed to the ground and the screen facing the leg.As a consequence, the user has to select the correct settings about the smartphone position, as shown in Figure 8.Then, by accessing the tracking section of the interface, the recording may be started, as shown in Figure 9. Afterwards, the user may put the smartphone in the pocket.The app will stop recording when it detects that the smartphone is not in a pocket anymore and will restart automatically when this condition is detected again.Also, a notification will appear on the smartphone locked-screen.

Data Structures.
To accomplish all the requirements and easily manage the workflow of the system it was necessary to design some useful data structures, to hold both the data collected from the sensors and those generated by the app itself.Moreover, also the internal database structure has to be designed, to permanently store the tracking data in an efficient way.
The internal database is mainly used to store the tracking and the training data (raw data), in separate tables.One of the main problems to address was the growing size of the database during the everyday recordings.The database has been then designed to store only the essential data.In particular, the table to store the tracking data contains only the absolute timestamp of the window, the label, and a small set of information about it.For the raw data, every row contains a timestamp, the value for the three axes of the accelerometer (optionally, also the barometric value), and a small set of additional information; 128 of these rows are stored for each window.So not all the features calculated are stored, since they require a huge amount of memory and can be recalculated later in a simple way, if necessary.
The size growth of the database was estimated by leaving the app working for many hours, and it was possible to verify that, for the activity tracking only, over a standard 8-hour working day, the growing is of the order of hundreds of KBs, while, if saving the raw data, this value could rise to ten MBs.For these reasons it has been chosen to leave the user free to choose to store or not the raw data, via an option in the app settings.Moreover, this way, assuming a regular sync with a remote web server at least once a week, the database size does not yet represent a problem.Another problem to address is the time required to quickly write a group of tuples at once in the database.When using the helper classes provided by Android SDK, this operation is very slow (≈1.5 s).For this reason, to store the 128 tuples of raw data per windows at once, a low level approach based on the SQLite JDBC driver was adopted.This way, it was possible to lower the time to only ≈100 ms.

Classification System Implementation.
The features extraction phase is implemented by four classes (one for each group of features), to easily choose only a subset of them if necessary.These classes are called TimeDomainFea-tureExtractor, FrequencyDomainFeatureExtractor, Structural Feature Extractor and TransientFeatureExtractor. They all extend the abstract class FeatureExtractor.The computed features are stored, progressively for each subgroup, in the main FeatureSet class instance.
The classification phase is the one for which the WEKA library has been used more extensively.Once the FeatureSet from the target time window is obtained, it is necessary to translate it into a WEKA Instance, which represents the single example to be classified.This has been done through the method toInstance(Instances es) of the FeatureSet class.This method takes as input a WEKA Instances class (a Java interface for a dataset and its instances, to be not confused with the Instance).In particular, the idea is to provide a dummy (empty) dataset with the same header of the one used to train the WEKA model.This way the software can check if the FeatureSet contains all the features required, and throw an exception if not.The Instances class representing the dummy dataset is obtained by reading a simple file stored in the app resources.
Since both the dummy dataset and the model are stored in files, they have to be loaded at the startup of the app (precisely, at the startup or the BackgroundSensorsService). The WEKA tool provides helper classes for this kind of operations.It is possible to obtain a representation of the dataset by means of an Instances class.Moreover, to deserialize the model, the read method of the helper class SeralizationHelper has been applied: by reading the model file, it gives back the already trained Classifier class implementation for the stored model.
The classification is simply done by the classifier's method classifyInstance (Instance is), which takes as input the instance to classify and gives as output a double value ranging from 0 to  − 1, where  is the number of classes.Algorithm 1 shows a simplified implementation of the classification process adopted, which clarifies the use of the already mentioned classes.

Base Classifier Selection.
A first step in the classifier selection was the choice of the base one to start from.Therefore, before moving to the implementation of the mobile app, Finally, since the selected model has to be serialized and stored in a smartphone app, also the file size is an important characteristic.Both the DT and the NN produce a model of ≈1 MB size, with a slightly bigger one for the NN.-NN gives a model of ≈10 MB.In fact, the disadvantages with lazy learning include the large space required to store the entire training dataset.Moreover, particularly noisy data increase the needed set unnecessarily, because no abstraction is made during the training phase.
Figure 10 shows a summary of the characteristics of the three classifiers.The DT gives a good compromise for all the considered characteristics, traded off with performance.Then, this is the base classifier chosen to build up the final PA monitoring system, to be implemented in the mobile app tool.

Activities Misclassification Problem.
As stated above, the main performance loss is due to the wrong classification of some activities.Classes in this subset will be called small displacement activities, in contrast with the static activities (sitting and standing) and the big displacement activity (jogging).Activities in this subset feature very similar characteristics and movements; then also the extracted features have similar values, leading to a weak and error prone model.Another problem, which comes out mainly in the test phase, is the wrong classification of the slow walking activity.This is probably due to the fact that the walking was recorded in a controlled setup, that is, on a treadmill at a fixed pace, and then a walking activity with a different or irregular pace sometimes is misclassified as upstairs or downstairs.For these reasons, the need for a more accurate and stronger model not only to better distinguish between the small displacement activities but also to generalize the walking data and to recognize a larger range of paces clearly emerges.To produce a better classifier two solutions have been tested, a hierarchical approach and a generalized data approach, and one of them has been chosen.

Hierarchical Approach.
The first approach tested is the hierarchical one, which allows splitting the original problem in multiple, smaller subproblems.The idea is to create simpler but stronger classifiers to select between few activities.In Figure 11, the two hierarchical schemes proposed and tested are shown, but only the results of Figure 11(b) are reported, because it performs better than Figure 11(a).The selected hierarchical classifier is based on simple DTs, the dimensions of which are related to the difficulty in classifying the target activities.As an example, the model used for walking-upstairs has an average number of nodes equal to 90, while the joggingstanding has only 3 nodes.
Considering that the main impact on performance was given only by the small displacement activities, the model that involves these three activities is the most important one, which affects the overall performance.Tables 10(a), 10(b), and 10(c) show the performance of this smaller classifier but, considering the F-score values, it is clear that there is not an improvement with a "smaller" model.The real problem is in the data chosen to train the classifier and not in the classification scheme used.In particular, they have to be generalized.activities are the ones related to walking actions and that the real-world dataset does not contain the upstairs and downstairs classes, only the walking windows have been used.Figure 12 shows the new class distribution, once the walking time windows taken from the real-world dataset have been included.Even if the class distribution is now strongly unbalanced, adding new time windows is necessary, to identify all the different walking patterns and paces as walking, as confirmed by experiments.
The new results provided in Table 11 show an improvement of the weighted average F-score value, from 0.864 to 0.923, but two classes are affected by a noticeable performance loss, that is, upstairs and downstairs.The reason is that some users collected the data by placing the smartphone with a different orientation compared to the laboratory dataset.This fact can give rise to misclassification and performance reduction for those activities particularly affected by the smartphone position and orientation.There is the need to strengthen the model and restore the classification performances on the upstairs and downstairs activities, but keeping the generalization yet introduced.to possible labeling errors for the real-world dataset.It has to be also more insensitive regarding the way people use to perform the same activity but still retain the class separation properties.To this aim, bootstrap aggregating and pairwise classification have been used.The former identifies ensemble learning techniques, among which the bagging one was used and applied to DT classifiers.The latter decomposes the classifier into several two-class problems and combines their outputs through voting techniques (as for bagging).If the classes are evenly populated, a pairwise classifier is at least as quick to train as any other multiclass method [41,42].Moreover, since the DT performs an intrinsic feature selection and taking into account the difficulty of distinguishing the two selected activities, having a simpler two-class classifier allows creating models of acceptable complexity.The number of nodes in the DTs has the same order of magnitude as discussed before.

Final Results on the Dataset.
Once the described methods and techniques are applied, it is possible to appreciate an overall performance improvement of the weighted average Fscore, up to a value of 0.988.In particular, it changes from 0.712 and 0.7 to 0.676 and 0.71, for upstairs and downstairs, respectively.
An important perspective to analyse is the sensitivity of the algorithm to the smartphone orientation.To test this aspect, both datasets have been processed inverting the axis of the accelerometer and simulating the smartphone in a different orientation (upside down).Then the trained model has been tested with this data.As expected, the performance dropped down to an F-score of 0.729, due to upstairs and downstairs, but in particular for the standing class, where the orientation highly affects the classification.
For these reasons it was useful to create a reduced model with only two activities.Considering the goal of monitoring the PA in a workplace, the minimum target is to distinguish between not active (sitting) and active (all the other activities).Since the class distribution would become even too unbalanced, the sitting instances from the real-world dataset have been also added to the test and training set; the new class distribution is shown in Figure 13.This setup shows improved performances, according to Table 12, and achieves an average weighted F-score of 0.988, even for the test done with the upside down smartphone orientation.Given these results, the model is suitable to be implemented even when the smartphone orientation is not fixed, or there is no information about the orientation at all.A further solution could be to estimate the orientation from features that are not affected by orientation, like the acceleration magnitude [43][44][45], and then use a different classifier for each relevant position.

Conclusion
The aim of this research was to address the easy and rapid development and testing of classifiers to be used in physical activity monitoring systems, targeted at individuals in their workplaces.Such a result has been obtained through the design of an Android-based mobile application, which also demonstrates the feasibility of implementing even complex HAR solutions on a mainstream device like smartphones.The   paper offered a complete overview of the activities classification problem from multiple points of view, starting from the types of sensors to use and their position to a taxonomy of the most used techniques in this field.A mobile app for HAR algorithms design has been presented, using state-of-the-art tools for machine learning, like the WEKA toolkit.It was then possible to offer an extensive performance evaluation of some of the most common classifiers.Afterwards, an analysis of the main problems which occurred was carried out.This led to the design of a system capable of overcoming these problems in a simple and effective way.
In particular, the implemented HAR system was designed to be used in a workplace environment to monitor the physical activity of the workers, equipped with the capability to distinguish between six activities.The chosen classifier is based on a hierarchy of DTs to which some improvements, like boosting and pairwise classification, have been applied, to increase the average performance.The final classifier reaches a weighted average F-score of around 92%.A simplified approach capable of distinguishing between active and not active states was studied, which provides much reduced sensitivity to the device orientation; it is capable of an F-score up to 99%.This last approach could be used, for example, in an elderly monitoring system, since it guarantees very high performance on the classification of simple activities, useful to detect the state of the patient in a room.
The final algorithm has been tuned by exploiting rapid test and development through an Android smartphone application developed ad hoc.All the software components in the app have been implemented following the Android design patterns and using the WEKA API as the core of the classification system.Such an app sets the basis for the design of new algorithms, since it allows a very easy replacement of the classifiers, only by changing the file in which they have been serialized.It could be a practical and easy tool for benchmarking new algorithms, even applicable and extensible to different domains compared to PA monitoring.The app is even provided with the capability of recording the collected sensors data, to create future new and richer datasets for HAR.

Figure 1 :
Figure 1: Age distribution among the subjects for the real-world recorded database.

Figure 2 :Figure 3 :
Figure 2: Height distribution among the subjects for the real-world recorded database.

Figure 5 :Figure 6 :
Figure 5: Class distribution for the real-world recorded processed dataset.

Figure 8 :
Figure 8: Selection of the smartphone position setting through the application interface.

Figure 10 :
Figure 10: Summary of the characteristics shown by each base classifier.

Figure 11 :
Figure 11: Hierarchical models used to solve the misclassification problem.

Figure 12 :
Figure 12: Resulting class distribution with the walking time windows taken from both laboratory and real-world dataset.

Figure 13 :
Figure 13: Resulting class distribution with the walking and the sitting time windows taken from both laboratory and real-world dataset.

Table 1 :
Main characteristics of HAR systems.

Table 2 :
The different categories and types of activities in the current literature.

Table 4 :
Common features used in HAR systems classified by domain.

Table 5 :
Taxonomy of classifiers proposed in state-of-the-art HAR systems.

Table 6 :
Summary of the time windows and processing parameters.

Table 10 :
Small displacement subclassifier performances.One of the solutions to generalize the training data is to use also the data from the real-world dataset, since they are collected by the users in real life conditions and can give a better and comprehensive model of the activities.Assuming the most problematic

Table 11 :
Classifier performance when trained with generalized walking data.
4.3.Robustness Improvements.The classifier has to be made more insensitive to the varying smartphone orientation and

Table 12 :
Final bagging active/not active classifier performance, also considering the same classifier with a different smartphone position (upside down).