A novel methodology, the double layer methodology (DLM), for modeling an individual’s lifestyle and its relationships with health indicators is presented. The DLM is applied to model behavioral routines emerging from self-reports of daily diet and activities, annotated by 21 healthy subjects over 2 weeks. Unsupervised clustering on the first layer of the DLM separated our population into two groups. Using eigendecomposition techniques on the second layer of the DLM, we could find activity and diet routines, predict behaviors in a portion of the day (with an accuracy of 88% for diet and 66% for activity), determine between day and between individual similarities, and detect individual’s belonging to a group based on behavior (with an accuracy up to 64%). We found that clustering based on health indicators was mapped back into activity behaviors, but not into diet behaviors. In addition, we showed the limitations of eigendecomposition for lifestyle applications, in particular when applied to noisy and sparse behavioral data such as dietary information. Finally, we proposed the use of the DLM for supporting adaptive and personalized recommender systems for stimulating behavior change.
Managing health requires a holistic understanding of the individual’s behavior [
In this work, we propose the use of machine learning and eigendecomposition techniques to detect individuals’ routines captured from self-reporting of daily diet and daily activities and to find behavioral correlates of health indicators. The main objectives of this work are to test the feasibility of the eigenbehavior technique (initially presented by Eagle and Pentland in 2009 [ propose a novel approach, referred here to as discuss the implications of our approach for the design of adaptive and personalized recommender systems.
After describing the study cohort (in Section
The subjects in this study were part of a bigger study of 75 subjects [
In this work, diary data and anthropometrics from only 21 of the subjects (gender: 8 males, 13 females) have been used because the other subjects did not provide sufficient annotation about their diet. The mean number of annotated days per subject was 10. A summary of the study population demographics and additional readings, such as VO2max, relative VO2max, fat mass, fat free mass, percentage of fat mass, rest metabolic rate (RMR), and basal energy expenditure (BEE), are reported in Table
Study population demographics and health indicators.
Variable | Unit | Mean ± standard deviation |
---|---|---|
Age | Years | |
BMI | kg/m2 | |
Weight | kg | |
Height | cm | |
VO2max | ml/min | |
Relative VO2max | ml/kg/min | |
RMR | kcal/min | |
BEE | kcal/day | |
Fat mass | kg | |
Fat free mass | kg | |
Percentage of fat | % | |
Daily activity and diet were manually annotated by the subjects in a table format, where the start and end time of the activity and time of food item consumption were also indicated. Preprocessing techniques, such as tokenization, word removal, spell checking, and lemmatization, were applied for the analysis of the provided annotations. Additionally, words were separated into two categories, one for diet and the other for activity and grouped into classes as shown in Table
Activity and diet classes.
Examples of words included in the class | |
---|---|
Activity classes | |
Entertainment/relax | Shop, travel, watch, game, play, computer, TV, movie |
Work/study | Exam, homework, read, work, lesson, university, lecture, school, study |
Sport | Run, sport, gym, hockey, swim, fitness, soccer, workout |
Social | Meet, friends, call, party, talk, phone, parent, visit |
Vehicle | Car, bus, train, taxi, drive |
None | — |
Others | Wait, household, pack, shower |
Walk | Walk |
Bike | Bike, cycle |
Diet classes | |
Fruit product | Fruit, orange, apple, banana, kiwi, sultana, pineapple, smoothie, juice |
Grain product | Noodles, oatmeal, muesli, bread, macaroni |
Composite product | Sandwich, pizza, soup, rice, pasta, lasagna, hamburger |
Vegetables | Cucumber, spinach, carrot, pumpkin, broccoli, tomato |
Meat product | Beef, bacon, meat, sausage, chicken, steak |
Snacks | Nut, pie, candy, ice cream, chocolate, cake, snack, cookie |
Alcohol drink | Beer, wine, alcohol |
Others | Butter |
Seafood | Fish, tuna, salmon |
Caffeine drink | Cola, tea, coffee, cappuccino |
Starchy product | Potato, chip, fries |
Dairy product | Shake, milk, cheese, yoghurt |
After preprocessing, the activity and diet data were treated separately representing two behavioral spaces. Temporal information was included by considering activity and diet annotation in different periods of the day. For activity classes, each day was divided into three periods: morning (P0, from 00:00 to 12:00), afternoon (P1, from 12:00 to 17:00), evening (P2, from 17:00 to 24:00). breakfast (P0, from 00:00 to 09:00), morning (P1, from 09:00 to 12:00), lunch (P2, from 12:00 to 14:00), afternoon (P3, from 14:00 to 17:00), dinner (P4, from 17:00 to 19:00), evening (P5, from 19:00 to 24:00).
For diet classes, each day was divided into six periods:
Successively, binary behavior matrices,
The behavior matrix for one subject with annotations provided for 14 days is shown in Figure
Binary behavior matrices for one subject who annotated a 14-day diary. (a) The activity behavior matrix, each column corresponding to the activity classes as in Table
Daily activity behaviors
Daily diet behaviors
Eigenbehavior analysis was proposed by Eagle and Pentland (2009) [
Recognizing dietary and activity behavioral patterns across an individual’s lifespan and identifying collective behaviors, such as the lifestyle of fitness enthusiasts or the lifestyle of sedentary people, is somewhat different than defining behavioral dynamics of individuals and communities in a social network. In particular, meanwhile it is trivial to make community distinctions, for example, to cluster students on the basis of their belonging to one or another school; in the case of lifestyle, considered as a correlate of health, the definition of grouping is an ethically delicate and ambiguous problem. This is because in real-life scenarios, people are generally reluctant to be clustered and can often exhibit behaviors that are common to several groups; for example, not all the people that eat sweets are obese. Additionally, the definition of health is generally not binary; for example, even within a population of people with chronic diseases, some will be healthier than others according to their physical and mental ability to adapt and self-manage [
To overcome this problem, we propose the
Illustration of the proposed
The DLM, presented in this work, was implemented in python and organized in the following steps (also summarized in Figure
Summary of the steps in the DLM. At each level of the DLM (bodily and behavior), the different steps are numbered as explained in the main text. Steps adopted per individual and group analysis are separated by the thick blue line.
The results in the following section are presented using the sequence of steps as defined above.
Unsupervised clustering applied to the health indicators listed in Table
(a) Silhouette scores against number of clusters for
Cluster analysis
Features importance
Primary eigenbehaviors were computed for each individual and for each group as explained in Section
In Figure
(a) Three primary eigenbehaviors for an individual belonging to group 0. (b) Average groups behavior.
Individual’s primary eigenbehaviors
Average groups behavior
In Figure
For both individuals and groups and in both diet and activity spaces, behaviors’ reconstruction accuracy above 90% obtained with linear combination of eigenbehaviors could be reached using the first five to ten eigenbehaviors. Interestingly, dietary behavior required less number of eigenbehaviors than activity behavior at parity of accuracy of reconstruction (see Figure
(a) Mean and standard deviation of daily reconstruction accuracy across days and individuals against the number of eigenbehaviors required for such reconstruction. (b) Mean reconstruction accuracy of group behaviors.
Reconstruction accuracy for individuals daily behavior
Reconstruction accuracy for average group behavior
Results on the ability to predict an individual’s behavior during a portion of a day are reported in Figure
Prediction accuracy for the behavior during the last part of the day (P2, for activity; P3-P4-P5 per diet) given the behavior in the first part of the day (P0-P1 for activity, P0-P1-P2 for diet). An average of 66% accuracy is obtained for activity behavior, and 88% is obtained for diet behavior.
Day by day similarities were computed for each subject. Figure
(a) Heatmap representation of between days Euclidean distances in activity and diet space for an individual belonging to group 0. (b) Percentage of overlap between diet and activity days day index vectors against number of annotated day. Each point corresponds to data from a different individual.
Individual’s day by day similarities
Day similarities: overlap between activity and diet behavior
Similarities between individuals were computed by projecting individuals in different groups and calculating the Euclidean distances between individual pairs.
We represented such distances for each subject using a dartboard-like representation as shown in Figure
Gamified illustration of distances between individuals. Individuals (a) and (b) are in the center of the dartboards. (a) is projected in group 1 (indicated by the red background); (b) is projected in group 0 (indicated by the blue background). Individuals belonging to group 1 and group 0 are represented by red and blue dots, respectively. Equally spaced rays represent distances of different subjects from the central subject ((a) or (b)). Gray concentric rings are equally spaced reference distances to facilitate distance perception (in each dartboard the distances between rings are the same). Green and black circles highlight the closest individuals, belonging to the same group or the other group, respectively.
Individual from group 0 projected in group 1
Individual from group 1 projected in group 0
The Euclidean distances between the mean-adjusted behavior of the individual and its projection onto the group’s behavior for diet and activity spaces were computed as explained in Section
Figure
The cross-validated distance between individuals and the activity and diet behavior spaces of group 0 (a) and of group 1 (b). Dotted black line is the identity line used as reference for visual inspection.
Individuals behavior projected in group 0 behavior space
Individuals behavior projected in group 1 behavior space
The proposed health indicators-based clustering was compared with random clustering as explained in Section
Figure
Distribution of mean distances of individual’s behavior from own group behavior (a) and from the other group behavior (b) as obtained by shuffling individuals across groups. Background colors represent in which group individuals are projected, previous distance computation (red for group 1 and blue for group 0). Dashed lines refer to values of distances obtained by the proposed unsupervised clustering.
Within-groups Euclidean distance distributions
Between groups Euclidean distance distributions
The increasing availability of data from web, mobile, and wearable sensing in combination with the increasing usage of machine learning techniques is facilitating the employment of holistic approaches in the design of lifestyle applications, such as recommender systems for behavior change [
The computational steps at the first layer of the DLM revealed that our population could be clustered in two groups and that rest metabolic rate and fitness level could be considered relevant health indicators for clustering our study population. Resting metabolic rate is linearly related to the basal energy expenditure and depends on factors such as gender, age, weight, and height [
At the second layer of the DLM, the eigendecomposition technique was applied for detecting behavioral routines and a validation of the health indicators-based clustering technique was proposed. We showed that, for our population, dietary routines were more regular than activity routines, both at individual and at group level. At parity of reconstruction accuracy, the number of primary eigenbehaviors needed for reconstructing diet behavior was minor than for activity behavior. This outcome was also reflected in the higher accuracy of prediction of diet behaviors in portion of a day (88%), compared to prediction of activity behaviors (66%). These results should be considered in the light of the assumption upon which eigenbehavior analysis is based: eigenvectors with large eigenvalues contain most of the data variability. It follows that small number of primary eigenbehaviors corresponds to higher regularity of a routine. This assumption might not be necessarily true for routines across different behavioral spaces, and it can lead to incorrect interpretation if the data are noisy and sparse.
When computing between days similarities we found that the overlap between diet and activity routines was dependent on the length of the observation period and that, over a period of 14 days, there was a chance of about 20%, on average that if two days were similar in the activity space, they were also similar in the diet space. In a real scenario, overlap of routines across different behavioral spaces can occur, for example, if an individual is undergoing a physical training for which daily consistency of exercise and diet timing are required. Identification of day similarities can be used to estimate the frequency of a routine, especially for datasets covering longer recording period. In the context of recommendation, it could allow inferring in which days the user is more prone and open to receive feedback for behavior change.
When distances of individual behaviors from the average group behavior in diet and activity spaces were used to detect the individual’s group belonging, a classification accuracy below 70% was obtained. This result reflected the nonlinearity between health indicators and behaviors, meaning that the people having the same range of health indicators did not necessarily exhibit similar behaviors or that people with different health indicators could have exhibited similar behaviors, across different behavior spaces.
Finally, the comparison of the proposed health indicators-based clustering techniques against random clustering revealed that the results obtained in the diet space could have been produced by chance. This consideration also extends to the results on the regularity of diet behavior, particularly, because diet data were noisier and sparser (as shown in the example of behavior matrix in the diet space in Figure
Taken together, our results show the limits of applying eigendecomposition to model behavioral routines. In particular its sensitivity to noise and its assumption of linearity should be carefully accounted for, considering the sparse and complex nature of human behavior. The proposed DLM and validation technique allowed identifying the limitations of eigendecomposition and finding behavioral correlates of health indicators for our study population.
We believe that the DLM, here demonstrated as a tool for automatic analysis of behaviors from self-reports, has a potential use for the development of tailored recommender systems that account for differences in behaviors across individuals in different groups. In particular, it could be applied to characterize specific health profiles. Also, the proposed dartboard-like visualization could be used to facilitate the selection of advices to be given to an individual, not only on the basis of the collective behavior of the group to which he/she belongs, but also on the basis of his/her similarities to individuals belonging to other groups [
The major limitations of our study lie in the typology of our data source (semantic information from self-report diary) and in the size of our dataset (in terms of both number of participants and number of observations per subject). Self-reporting of diet and activity, in the form of diary or 24-hour recall questionnaires, is largely employed in large epidemiological studies, representing a trivial and at hand solution to collect information on individual behaviors. However, self-reporting measures require considerable efforts from the observant, and lack of motivation, memory decay, and imprecision of recall can compromise the reliability of such information for accurate behavior tracking [
Self-report and daily diary both in the form of manual annotation and mobile applications constitute up to today the most commonly used and intuitive method for recording behavior routines either for personal self-tracking or for clinical prescription. In this work, we propose the DLM for analyzing lifestyle patterns emerging from self-reports of daily diet and activity. Despite the limited size of our dataset, we show the potential of the DLM for the identification of behavior routines and of their relationship with health indicators. We evaluated the limits of eigenbehavior decomposition when applied to behavior data for lifestyle applications. We also proposed a novel gamified representation of individual and group behaviors which could be used to support selection of advices in personalized recommender systems for behavior change.
The novelty of the DLM lies in the criteria that were used for grouping individuals by health indicator surrogates. We showed that such separation partially reflects in group behaviors, for particular behavior spaces, which indicates that health condition and intrinsic physical characteristics can play a role in the exhibition of a particular behavior. Although at this stage, causal relationships between behavior and health status cannot be proven, this result has potential for the application in behavioral therapy for improving a person lifestyle and health.
Future extensions of the DLM will include the use of other sources of information, such as data from wearables and ubiquitous sensing and the validation of other behavior tracking algorithms for overcoming the limitations of eigenbehavior decomposition. In particular, alternative definitions of routines will be considered (e.g., the definition of de Lira et al. (2014) [
The authors declare that there is no conflict of interests regarding the publication of this paper.
The authors would like to thank Fangjing Wu, student at Technische Universiteit Eindhoven, Netherlands, for working with them in the preprocessing of the diary sematic data.