A system for retrieving video sequences created by tracking humans in a smart environment, by using spatial queries, is presented. Sketches made with a pointing device on the floor layout of the environment are used to form queries corresponding to locomotion patterns. The sketches are analyzed to identify the type of the query. Directional search algorithms based on the minimum distance between points are applied for finding the best matches to the sketch. The results are ranked according to the similarity and presented to the user. The system was developed in two stages. An initial version of the system was implemented and evaluated by conducting a user study. Modifications were made where appropriate, according to the results and the feedback, to make the system more accurate and usable. We present the details of the initial system, the user study and the results, and the modifications thus made. The overall accuracy of retrieval for the initial system was approximately 93%, when tested on a collection of data from a real-life experiment. This is improved to approximately 97% after the modifications. The user interaction strategy and the search algorithms are usable in any environment for automated retrieval of locomotion patterns. The subjects who evaluated the system found it easy to learn and use. Their comments included several prospective applications for the user interaction strategy, providing valuable insight for future directions.
Multimedia retrieval and summarization for smart environments is an active research area with several applications such as surveillance, study of human behavior, and taking care of the elderly [
In some of these applications, such as long term surveillance and monitoring of patients suffering from dementia [
However, facilitating spatial queries for a smart environment is a difficult task due to two main problems. First, there should be an intuitive and nonrestricting way to input queries on human locomotion patterns into a computer. Second, algorithms for searching for specific movements have to be designed and developed.
Sketching is a common method used by humans to specify or describe patterns of movement. With several common factors such as area, distance and direction, sketching and locomotion has an intuitive mapping between them. Sketching is a simple activity that almost everybody is capable of performing. Despite different sketching habits and techniques, many people are capable of interpreting sketches made by others; this suggests that there are some intuitive and widely accepted notations for sketching. All the computer platforms that are widely used today have user interaction capabilities that support sketching. Therefore, sketching is a highly prospective candidate for specifying locomotion patterns and synthesizing queries regarding the same.
In this research, we propose a novel user interaction strategy for spatial querying of locomotion patterns in smart environments. The user specifies a locomotion pattern to be searched for by sketching on a floor plan of the environment. Sketching is performed on a graphical user interface with a pointing device. Different types of queries are designed to enable searching for different locomotion patterns with simple sketches, intuitively and without ambiguity. We also design and implement algorithms for retrieving locomotion patterns represented by these queries.
We implement the proposed spatial querying system in a two-bedroom smart home [
However, this system facilitates only temporal queries. For example, it is possible to search for video showing people walking inside the house between 6:00 PM and 7:00 PM on a given date. We intend to enhance the capability of the above system by incorporating spatial queries to retrieve video correlated to locomotion patterns. Our objective is to facilitate queries such as: “Retrieve the video clips showing the people who walked from the living room to the study room in the morning of the 1st of May 2007”. Using the proposed user interaction strategy, we submit spatial queries by sketching paths on the floor layout of the house. The floor sensor data corresponding to the video clips are then searched for similar paths. The best matches are retrieved and shown to the user as video and footstep sequences. We evaluate the performance of the algorithm using a data set recorded during a real life experiment with actual residents.
The proposed user interaction strategy is designed based on a two-stage process. First, the user strategy is designed and implemented as an initial system that can be used for retrieval of locomotion patterns. We design and conduct user studies to both identify sketch notations used for specifying locomotion patterns, and to evaluate the usability of this system. Based on user feedback, we redesign the user interaction strategy and the search algorithms in order to achieve a better solution.
The rest of this article is organized as follows: Section
The research area of smart homes has been a growing area, combining advances in several technologies such as sensing, computing and storage [
There has been some research towards a framework for spatial querying of locomotion patterns. Egenhofer [
So far, there has been very little research on user interfaces for spatial querying. Ivanov and Wren [
In the following sections of this article, we describe how we approach the problem of spatial querying of locomotion patterns in smart environments based on the framework proposed in previous work, and complement it with a user interaction strategy that facilitates effective searching. We also present user studies designed for gathering information required for designing a user interaction strategy, that is, intuitive and efficient, and evaluating the proposed system.
Our objective here is to facilitate querying by allowing the users to specify the path they want to search for, by sketching that path on a diagram of the house floor plan. In order to design an interface that is versatile, easy to use and intuitive, we took the following, user-centered approach.
When a person is standing or walking inside a house, he is always referred to as being in a “region”. Therefore, it is desirable to facilitate specification of regions during querying. For this purpose, we partitioned the house floor plan into the regions labeled in Figure
Partitioning the smart home floor into regions.
The user interaction strategy should allow the user to submit all three types of queries in a simple and intuitive manner. There should be no ambiguity between any two types of queries. The complexity of the input is also important. Since queries of types 1 and 2 are less specific, it is desirable for them to be easier to enter.
To query for footstep sequences within a given region, the user scribbles within that region on the floor plan (Figure
Different types of spatial queries.
To query for movement between any two regions, the user simply draws a line between the two of them, without considering the walls that partition the regions (Figure
When querying for a specific path, the user sketches the path along the house plan (Figure
The personalized video from the ubiquitous home is created by selecting cameras and microphones automatically for footstep sequences of different persons [
Each footstep sequence is an ordered set of four-dimensional data elements with the following variables: time stamp indicating when the sensor was activated, duration that the sensor was active.
The
The array of pixel coordinates contained in the sketch has to be preprocessed before searching. First, the points are transformed from pixel coordinates to the house floor coordinates. Ideally, this should be possible using a linear transform if the house floor layout on the interface is drawn to scale. However, due to calibration errors in floor sensor data, minor adjustments are needed. The ordered set of points
The next step is to identify the type of the query by analyzing the distribution of points in
To perform this query type, we retrieve all footstep sequences present in the specified region, from the collection of tracking data. If a time interval is specified, the results are filtered using that interval. The results are ordered by the starting time of the sequences.
First, we retrieve all the footstep sequences that include floor sensor data from both regions, and filter them by the specified time interval. Thereafter, the following algorithm is applied to each candidate path Starting from Within the subsequence Select
Again, the matches are ordered by the starting time stamp.
We start the search by selecting all the footstep sequences recorded during the time interval that the user specified. For each of the candidate paths Set overall mean distance For the first point Add the Euclidean distance between Repeat the steps 1 and 2 for the next point in Divide If
This algorithm looks for paths with less deviation from the sketched path, while preserving direction. The threshold value
The matched paths are presented to the user in ascending order of the overall mean distance. Figure
A search query of type 3, with the best matching result.
We designed a graphical user interface based on the above strategy for retrieval of personalized video from the ubiquitous home. This interface is based on the concept of
User interaction with system: the circled numbers 1 to 5 correspond to steps 1 to 5 described in text.
All the pixels along the sketched path and the date and time interval (if submitted) are recorded as inputs for the search algorithm described in the following section. The time to make the sketch is recorded as an additional input.
The results retrieved by the first two types of queries are straightforward, since they are direct queries on a relational database followed by a linear search. In order to evaluate the performance of the search algorithm for query type 3, we conducted the following experiment. The house floor was partitioned into
The system was tested on a selected set of 94 footstep sequences obtained from 12 hours of data gathered during a “real-life experiment”, where a family of three members stayed in ubiquitous home for 10 days. A set of 56 pairs of subregions were selected by observing these data.
During evaluation, five paths between each selected pair of subregions were sketched and results were retrieved. Both the instances where wrong paths were retrieved and correct paths were missed were recorded. The precision
The precision of retrieval was 92.5% and the recall was 98.8%. The balanced F-measure was 95.2%. Most of the mistakenly retrieved clips were candidate paths that are shorter than the sketch but match well with the corresponding segment of the sketch.
We conducted a user study on sketching spatial queries, with the following objectives. Analyze how people sketch locomotion patterns and identify the common sketch types and places where they disagree. Find out how people interpret sketches representing locomotion patterns. Evaluate the initial system described in Section Acquire feedback related to both the initial system and sketching locomotion patterns. Identify future directions and prospective applications of sketch-based querying of locomotion patterns.
Since it was not possible to find an existing method of evaluation available to fulfill all of the above, we designed and conducted our own user study. This study consisted of five sections. The first three sections consisted of three different tasks related to sketching locomotion patterns. The fourth section was a usability study of the initial system. The fifth section contained questions regarding sketch-based querying, and allowed the subjects to write their comments freely. The Following subsections of the article describe the above in detail.
The objective of this section is to identify the notations that people use and the difficulties they encounter, when sketching a locomotion pattern that they haven’t experienced or seen. The section consists of 16 sketching tasks. In each task, the test subjects read a textual description of an instance of locomotion in the house (e.g., “walking from the living room to the study”) and sketched it on a floor plan of the house. The descriptions were selected in such a way that they describe different locomotion patterns. Some descriptions with a certain degree of ambiguity were deliberately included.
All the sketches were made on answer sheets with preprinted house floor layouts. After the tasks were completed, the subjects were asked whether there were any particular movement patterns that were difficult to sketch. They were allowed to describe freely (in writing), if there were such difficulties or comments.
The objective of this section is to identify the notations that people use and difficulties they encounter when sketching a locomotion pattern that they have seen or experienced. The section consisted of nine sketching tasks. In each task, the subject watched an animation showing a person moving on the house floor plan. These animations were created by observing patterns of movement exhibited by the residents of the house during previous real-life experiments [
All the sketches were made on answer sheets with preprinted house floor layouts. After the tasks were completed, the subjects were asked whether there were any particular movement patterns that were difficult to sketch. They were allowed to describe freely (in writing), if there were such difficulties or comments.
The objective of this section is to examine the users’ ability to learn a sketching notation for querying locomotion patterns, and identify ambiguities and difficulties from a user’s perspective. While the first two sections consisted of general sketching tasks, this section was based on the initial system for retrieving locomotion patterns. The notation of the queries used in the initial system was explained to the subject, with examples. The subject was allowed to try at least one example query from each type by himself, to familiarize with the system before the actual tasks began.
The section consisted of six tasks. During each task, the subject was shown a screen capture of a spatial query on the system, and asked to interpret the query and its type. While interpreting the type of a query is a task for the system, not the user, this question helps us to understand if there are situations where the user is not sure how to sketch a query, due to either ambiguity or lack of intuitiveness. Six screen captures, consisting of two queries from each type, were shown to the users in random order.
After the tasks were completed, the following questions were asked. Were there any movement patterns that were difficult to interpret? Describe briefly. Out of the sketch types used in this system, which type do you think is the most useful when specifying activity inside a house?
The first question was asked to identify any ambiguities or counter-intuitive notations in the current set of query types. The second question intends to seek for any user preferences, which if found can be used for putting more effort in to designing such queries.
After using the system further if they thought it necessary, the subjects rated its usability by answering a questionnaire, based on the guidelines established by Chin et al. [
In this section of the experiment, the subjects answered the following general questions: Do you think it is easier to specify a pattern of movement inside a house by using a sketch than a verbal description? What are the other applications that you would suggest for a sketch-based interface that can accept movement patterns as input?
After answering the questions, the subjects were given the opportunity to make additional comments and suggestions in free format.
The procedure of the user study followed the guidelines used in a study for identifying how people sketch geographical information, by Blaser [
Each subject was briefed about the task at the beginning of each section, and also provided with written instructions. One of the authors was available throughout the experiment to provide additional clarifications if needed. Animations and video clips were replayed when the subjects found it necessary. Breaks, if the subjects needed any, were allowed between sections.
The subjects answered all the sections on answer sheets provided to them. In addition to the responses on paper, the time taken for the experiments and the stroke order of sketches were recorded.
The subjects took 24 to 40 minutes to complete the experiment. The average time consumed was 30.3 minutes. This time included short breaks between sections.
The responses gathered during the user study consisted of sketches, numerical ratings, and textual descriptions. Figure
Some example sketches by test subjects for different movement patterns: (a) walking inside the living room, (b) entering the house, (c) walking from the sofa to the table in the study, and (d) walking from the bedroom into the toilet and entering the living room after a short pause.
The textual descriptions provided in this section specified several primitive types of locomotion patterns; entering a region, moving within a specified region/s, moving between regions, and so forth. We attempt to identify sketching notations used by the subjects by separating notations for these primitives. We use the comments by the subjects to identify the difficulties and ambiguities in sketching a locomotion pattern.
Figure
Sketch notations for staying at a location.
Most of the subjects (9 out of 16) used a small circle to indicate the location related to the description. A cross was the next most common, and there was no other common notation.
Figure
Sketch notations for locomotion within a region.
It is evident that most of the subjects (14 of 16) used closed or near-closed curves to indicate movement within a region. Only one subject used a notation similar to that used by the initial system (type 1 query). Two of the subjects used arrowheads to represent movement.
Figure
Sketch notations for movement between regions using arrows.
Sketch notations denoting a pause in movement.
Most of the subjects (12 of 16) used arrowheads to indicate the direction of movement, while the others implied the direction by the direction they used when sketching the line. The subjects drew arrowheads in different styles and stroke orders, as seen in Figure
For queries involving more than two regions (e.g., “Walking from the bedroom to the toilet, and then to the living room”), 9 out of 16 subjects used two line segments indicating a break at the toilet, whereas the others used only one line segment. While sketching these queries, 11 of the 16 subjects used arrowheads to indicate direction.
Entering a region was indicated using an arrow or a line leading into the region, while leaving the region was sketched using an arrow or a line leading away from a region. Again, 12 of 16 subjects indicated the direction using an arrowhead.
The subjects stated that they encountered difficulties in sketching the following descriptions (the number of subjects providing each response is shown in parentheses): entering corridor (8), passing through the corridor in either direction (6), standing inside the study room (4), from living room to kitchen (2).
The main reason for a description to be found “difficult to sketch” was the lack of specific information required for sketching. For instance, in the first description listed above, the users wished to know which entrance was used to enter the corridor. There are several entrances to the corridor, making it difficult to sketch. For the second, some of them wished to know both the entrance and the exit. For the third, the location where the person is standing is not given. Most of the users marked the circle toward the center of the room. However, it is evident that the subjects were less concerned about incomplete information once there was sufficient detail regarding the main action performed. Although the last description above does not mention the exact locations in the rooms, the main action described was “walking (moving)”, and only two subjects found it difficult to sketch.
The users found it much easier to complete this section, as demonstrated by their comments. There was no ambiguity in interpretation, since the subjects were able to see the movement by themselves. The notations used by most of the subjects were consistent with what they used during Section
Five of the subjects desired to show pauses of the moving person in their sketches. Figure
The users were able to make an accurate interpretation of the locomotion pattern specified by the sketch in most cases. The number of errors within the set of 96 subject-queries (16 users
Asked whether there were any patterns that were difficult to interpret, the subjects reported problems in identifying the particular query type in one of the queries (Figure
Type 2 query for movement from living room to the kitchen.
The subjects disagreed heavily on which type of querying is useful for specifying locomotion patterns inside a house. The following are the responses, with the number of respondents in parentheses: type 1 (3), type 2 (4), type 3 (5) all types (4).
Some of the subjects preferred the ability to specify minute details and hence preferred type 3. Some others were more interested in “region level locomotion information” provided by query type 2.
Providing additional comments, two of the subjects mentioned that query type 1 (scribbling in a region, corresponding to the presence within the region) is difficult both to sketch and interpret in narrow areas such as corridors.
Below we list the criterion descriptor, response mean, mode (in parentheses), and the range of responses for each criterion evaluated during the usability study: learning to use the system: 6.375 (6) 5–7 usefulness as a means of input: 5.625 (6) 4–7 ease of using the system: 5.813 (6) 4–7
Most of the responses after the mode (6) were for response 7, and there was only one response at level 4 for each of the second and third criteria. The results show that the system is quite easy to learn and useful, despite the fact that it was designed without conducting a requirements study. We believe that the reason for this is the intuitive nature of querying for a locomotion pattern using a sketch.
Answers to the first question of this section, “Do you think it is easier to specify a pattern of movement inside a house by using a sketch than a verbal description?” are listed as follows (the number of subjects responding in this way is indicated in parentheses). Sketching is definitely easier (12). Sketching is easier in most cases (3). Sketching is not any easier than a verbal description (1).
While most of the subjects agreed that sketching is easier than describing verbally, the four subjects who disagreed stated that for some simple queries (such as entering a room with only one door), a textual description is easier than making a sketch.
The following are the answers to the second question “What are the other applications you would suggest for a sketch-based interface that can accept movement patterns as input?” organized in the same format as above: searching for movement patterns outdoors (6), finding routes in a city (4), search interfaces for cities, shopping malls and so forth (4), support for trip planning, by including waiting times and preferred routes (3), specifying activity in an environment (3), healthcare (2), query for player movements in sports videos (1), for specifying player movements in a computer game (1).
The large number of responses was encouraging, and pointed to various applications and future research directions. Most of the suggestions are for outdoor applications, suggesting that spatial querying is more intuitive and appropriate in outdoor scenarios. For applications in indoor environments, the main restriction is the need for deployment of sensors for accurate tracking. Healthcare and surveillance are the most prospective applications, given the importance of being able to retrieve movement patterns in such applications. For outdoor applications, tracking data are easier to obtain with the availability of GPS data. While it is not possible to obtain video from very large areas, the movement patterns of vehicle fleets can easily be retrieved and shown on a map, using the proposed interaction strategy. Search in sports video is quite prospective as an immediate application in a medium-sized environment.
More than half of the subjects provided additional comments at the end of Section It is desirable to be able to sketch paths consisting of multiple segments. Different body gestures and actions such as “standing, sitting, running, sleeping” should be added. Instead of sketching an entire path, it should be possible to sketch using a set of points. The system should be extended to query for activities inside the house by sketching. It is possible to combine menu-based querying and sketch based querying, to provide users more choice.
Most of the subjects desired to see more functionality and control. Some others looked for more flexibility in entering queries, as indicated by the last comment.
The responses described in Sections
While most of the subjects drew arrowheads to indicate direction, the diversity of shapes and stroke-orders indicates that it will be difficult to interpret all of them automatically with sufficient accuracy. We suggest a semiautomated solution to this problem. The user can draw a line, to which an arrowhead can be added automatically. The direction the line is drawn can be used to determine the direction of the arrow. While this imposes a constraint on the direction of sketching, the results show that the users adhered to this rule even without imposing it.
The subjects did not find ambiguous or imprecise descriptions of a problem as long as the main action concerned with querying is described (Section
It is evident that the notation used in the initial system for the type 1 query is not intuitive, and was actually used by only one of the subjects. This should be replaced using a closed curve, which was the choice of most of the subjects. Further, this change will make the query usable in querying for locomotion in outdoor environments that do not possess the strong and fixed partitioning present in a house.
However, the notation for Type 2 queries creates a different situation. Although none of the subjects used it, including the query type in the user interaction strategy and making them aware of it will facilitate faster querying. As demonstrated in Section
According to these observations and the results of the quantitative evaluation described in Section
Changes were made to the notation for the three query types, to make them more intuitive and less ambiguous. The notation for a type 1 query was changed to a closed curve, instead of scribbling used in the initial system. Using a closed curve is intuitive according to the results of the user study, and reduces ambiguities. If the area enclosed by the curve is more than that of the largest ellipse contained in the same bounding box as the sketch, the bounding box is selected as the search region. Further, if each dimension of the bounding box is within a 10% deviation from that of a room/region that it is contained in, the entire room/region is considered for searching. All the closed curves drawn by the subjects to specify a room/region confirmed to these thresholds. This strategy facilitates querying within an entire room without making a precise sketch enclosing its area, while maintaining the ability to query for arbitrary-shaped regions.
Figure
Type 1 queries using the new notation.
For both type 2 and type 3 queries, the interface was modified to add an arrowhead automatically to strokes that are not closed curves. A query was identified as type 2 only if the sketch crosses a partition. This removes the ambiguity described in Section
In order to increase the precision of retrieval, the matching algorithm was modified slightly. The retrieved results were filtered to remove false positive results, using the distances between the starting points and ending points of the sketch and the retrieved path.
The normalized distance between starting points,
A candidate path
The experiment described in Section
We have proposed a user interaction strategy and a search algorithm for querying and retrieving from a collection of video sequences based on spatial information. These can be applied to the results of human tracking based on any type of sensors, and therefore are not restricted to environments with floor sensors. The accuracy of retrieval, when applied to real-life data from a home-like environment, was approximately 95%.
A user study was conducted to both identify how people sketch human locomotion patterns, and evaluate the proposed system in terms of usability. Changes were made where appropriate, according to the user feedback and the results of the quantitative evaluations. After the modifications, the accuracy of the system increased to about 97%, and was more intuitive to use. The user interaction strategy and search algorithms can be employed to query any type of tracking data from different environments, with minor modifications such as changes of area layouts.
Creating a formal model for the queries including time and speed will increase the versatility of spatial queries, and we are working in this direction at the moment. The feedback from the test subjects indicates that there are several prospective future directions. These include searching for locomotion patterns in outdoor environments, querying for player behavior in sports videos, and travel planning. Work on a system for retrieval of locomotion patterns using continuously archived GPS data is currently in progress.
The authors thank the voluntary subjects who participated in the user study, for their contribution. This work was partially supported by NICT, CREST and JST of Japan.