Human beings communicate in abbreviated ways dependent on prior interactions and shared knowledge. Furthermore, humans share information about intentions and future actions using eye gaze. Among primates, humans are unique in the whiteness of the sclera and amount of sclera shown, essential for communication via interpretation of eye gaze. This paper extends our previous work in a game-like interactive task by the use of computerised recognition of eye gaze and fuzzy signature-based interpretation of possible intentions. This extends our notion of robot instinctive behaviour to intentional behaviour. We show a good improvement of speed of response in a simple use of eye gaze information. We also show a significant and more sophisticated use of the eye gaze information, which eliminates the need for control actions on the user's part. We also make a suggestion as to returning visibility of control to the user in these cases.
We consider an interactive task, which
involves a human-controlled “robot” (called a
“foreman,” effectively an onscreen agent) and
a pair of assistant robots. We introduce the
technology of fuzzy logic which is used by the
robots to infer human intentions. We briefly
reprise the fundamentals of fuzzy logic in a
nonmathematical fashion, followed by
definition of fuzzy signatures. The application
of our fuzzy signatures to modelling eye gaze
paths is discussed. Eye gaze is important in human communication [
The increasing availability of relatively inexpensive
and reliable eye-gaze detectors has sparked much interest in their potential
use in games. These uses have been in 4 main areas as follows. Use of eye gaze to improve rendering, where
the region of interest or attention is determined from the eye gaze and more
detail is shown in these regions [ Avatar realism for enhanced interaction with
humans, where an avatar will use some plausible eye gaze motions [ Eye gaze can be used in therapeutic games, for
example, in training of autism spectrum disorder children [ Use the user’s eyes as active game controls
via their eye gaze has generated significant interest. Use of eye gaze for
pointing [
The first 3 of these areas should be seen as incremental
enhancements of current games technologies or an
enhanced application. The last
of these is a novel development. This notion has its potential disadvantages,
of user fatigue, as fine eye control can be tiring [
None of these uses of eye-gaze information is similar to our notion of unobtrusive inference from the user’s eye gaze. To enhance the immersive experience of a good computer game, we believe our unobtrusive technique of inference with fuzzy signatures and possibility will be of significant benefit, as it does not depend on a user using a normal human action out of its usual context.
A group of autonomous intelligent
robots is supposed to build the actual configuration
according to the exact instructions given to the “robot roreman” (
The individual objects can be
shifted or rotated, but two robots are always needed to actually move an
object, as they are heavy. If two robots are pushing in parallel, the object
will be shifted according to the joint forces of the robots (see Figure
Task configurations.
Push, rotate, and stop configurations: two robots can push an object along or rotate it. A stop configuration would only occur when the user is intending to stop assistant robots from completing the wrong task.
If the two robots are pushing in the opposite directions positioned at the diagonally opposite ends, the object will turn around the centre of gravity. If two robots are pushing in parallel, and one is pushing in the opposite direction, the object will not move.
Under these conditions, the task can be solved, if all robots are provided by suitable algorithms that enable “intention guessing” from the actual movements and positions, even though they might not be unambiguous.
Figure
A random initial configuration of objects (dark squares) and robots (circles, the darker circle is the “foreman”) and target configuration (asterisks).
A sample path to the target.
The tasks/experiments performed
using our simulator ranged from the following: the control case of two
humans operating their own robots, to a human operated foreman and assistant robot
throughout, to a human operated foreman
and assistant robot to start the task which is then completed by the two
assistant robots.
Most of the experiment
subjects/users we had commented that this task could be easily generalised to a
game, particularly in the last case as they could see the benefit of assistants
who could complete tasks once the decision was made and (implicitly)
communicated by the human operator of the foreman.
It seems to us that the key benefit
comes from the ability to implicitly communicate the task. There seems to be a
benefit from starting and then having some automatic process finish for you.
The ability to “show what needs to be done” allows us to eliminate any need for
some kind of language (which the user would need to learn) to specify
To make the implicit communication effective, we have implemented a form of “instinctive behaviour” for the robots. These behaviours are selected by a robot, when it is unable to work out a more intentional action based on the current situation. At this time, these behaviours only include following the foreman around, and moving to the nearest object. Thus, the assistant robots are more likely to be “nearby” when they can infer the human’s task. Hence, their assistance arrives sooner.
We first reprise fuzzy logic/signatures.
Fuzzy logic is an extension to traditional binary sets. A fuzzy set can have any value from 0 to 1. So, a set of tall people would include someone only very slightly taller than average with a value of 0.51 and someone slightly shorter who would have a value of 0.49. If we were to use their height to predict their weight, we would get much better results using 0.51 and 0.49, rather than if we used the rounded up values of 1, 0 (i.e., 0.51 is quite similar to 0.49, while 1 is not very close to 0).
A hierarchical
structure to organise data, which is similar to the way human beings structure
their thoughts [
Each signature corresponds to a nested vector structure/tree graph
[
The key components are combined in a structured way at
the top, and each component can have some substructure, and each of those can
have their own substructure and so on. We construct these fuzzy signatures
directly from data [
In order to construct the fuzzy
signatures for inferring the foreman’s following action, we need to
figure out which “attributes”
will be essentially related to foreman’s intentional action based on the
current situation. Therefore, by measuring these “essential attributes,” the
other robots might be able to know what type of actions the foreman is going to
carry out, so that they can go and work with the foreman cooperatively. Since
the current situation is that there are a set of objects, if the foreman is
intending to do something, he can go and touch a particular object
first or get closer
at least. So, the first “essential attribute”
should be the “distance” between the foreman and each object in the
environment. Figure
Membership function of distance.
Proximity is used to infer
intention, however there exists a possible situation that cannot be handled by “distance”
only; if the foreman moves towards an object then touches it, but after that he
moves away or switches to another object immediately, the other robots still
cannot infer what the foreman is going to do. In order to solve this problem,
we need to add another “essential attribute” called “waiting time” (the
membership is similar in shape to Figure
By combining the “waiting
time” with the previous item “distance,” the final fuzzy
signatures for intention inference can be extended. Under this circumstance, other robots will be
able to infer the foreman’s next action according to his current behaviour. For
instance, if the “distance” between the foreman (
So far, we have discussed the problem of inferring the foreman’s intentional action by constructing the fuzzy signatures based on the foreman’s current behaviour. In some sense, it means other robots still have to count on the foreman completely. Probably now, these robots might be able to work with the foreman cooperatively, but it actually does not show that these robots are intelligent enough that can help the foreman to finish the final task effectively and efficiently as well as to truly reduce the cost of their communication.
In order to improve our modeling technique, it is important for us to consider the current situation after each movement of an object, which means other robots should be able to guess which object shape is supposed to be the most possible one according to the foreman’s previous actions and the current configuration of objects.
The solution here is that we need to measure how close the current object shape matches each of the possible shapes after the foreman’s intentional actions.
Therefore, apart from the previous
fuzzy signatures, we construct another data structure to model robots’ further
decision making (see Figure
Figure
if foreman and a robot push an
object to a place which matches one of the possible object shapes: if foreman and a robot push an
object to a place which does not match any of the possible object shapes, then
none of the possibility values will change; if foreman and a robot push an
object which matched if foreman and a robot push an
object which matched if two robots (where neither is the foreman) push an object to a place
which matches one of the possible object shapes:
The above points recognise that only the user can
substantially alter possibility values (the representation of user intentions) for task goals.
Structure of pattern matching with possibility calculation.
The inherently hierarchical nature of fuzzy signatures and the ability to deal with complex and variable substructure make them ideally suited to the abstract representation (in the form of a polymorphic version of fuzzy signatures) or storage of eye-gaze path data (simple fuzzy signatures).
A possible mapping of
eye-gaze path data for a set of artificial eye-gaze paths (see Figure
Artificial images showing eye gaze on a face, for illustration only. Note similarity of eye-gaze path near eyes, but not near mouths.
Fuzzy signature structure, showing top and two
arbitrary levels
166 | 179 | 194 | 205 | ||||
---|---|---|---|---|---|---|---|
311 | 311 | 311 | 311 | ||||
28 | 28 | 29 | 29 | 30 | 32 |
For a fuzzy signature with 2 arbitrary levels.
For an arbitrary branch
The raw eye-gaze path is quite complex and difficult to interpret.
The scenario we use is quite simple, and the recording
of eye gaze was from presentation of the scenario to the user-identified
decision point. The instructions were to indicate when they would otherwise start
moving the foreman. Figure
Eye-gaze path for 1 subject, expt 3, horizontal rows configuration.
We can readily identify the most important part of this path. We have found from their eye gaze (in our restricted domain) that users seem to automatically recheck their decision just prior to implementation.
This is consistent with work on decision making in soccer [
In Figure
Last part of an eye-gaze path, reduced to fixations and superimposed on the task image.
In the next section, we describe in detail how we perform task inference using fixation information from the eye-gaze path.
The duration of fixations is represented by the size of
black circle overlaid on the image. We show in Figure
Fixations overlaid on part of scene.
The size of fixation circles on the scene representing
fixation duration was kept small to allow the scene to be visible, and so do
not represent the probable area of interest shown by the user. In Figure
Detail of fuzzy eye-gaze inference.
We will consider partial inference regarding the columns in our scene.
Fixation 1 projects onto a trapezoidal fuzzy membership function which represents the degree of possibility from 0 to 1 of the user’s eye gaze indicating user interest in that region. As mentioned earlier, the size of black circles was kept small to not obscure the scene. So, the size of the circles does not represent the size of the likely region of interest. Thus, the core of the fuzzy membership function (horizontal top part with value 1) is wider than the size of the fixation circle.
We perform the same projection for fixation 2.
We then sum the two fuzzy membership functions, using a union operator. This can result in a concave or convex result. In this case, as the cores overlap, the result is convex, a wider trapezoidal fuzzy membership function. This is projected on the scene. As our scene is discretised, we can represent the result visually as areas truncated from discrete rectangles.
An identical process is followed with regards inference
along the rows. Clearly, the next step is combining the horizontal and vertical
results of the inference. We show the results in Figure
Possible regions of interest.
We combined the results of the vertical and horizontal
fuzzy inference using intersection, and show this with intersection of the
diagram representations of the results we used before. The intersections are
shown shaded in Figure
We have made a number of simplifying assumptions above.
For example, the graphical treatment used required rectilinear regions of interest.
In Figure
Plausible regions of interest.
On the left in Figure
We list the remaining simplifying assumptions here: use
of union and intersection only, while many other aggregation operators are
available in this range [
We reprise our previous results for comparison purposes. We performed 3 initial experiments.
Although we allowed players to have
verbal communications in experiment 1 (see Table
Two human-controlled robots (control case).
Expterimental 1 | Horizontal rows | Vertical rows | T-shape | U-shape |
---|---|---|---|---|
Robot A (human) | 163.0 | 136.8 | 149.2 | 127.4 |
Robot B (human) | 141.6 | 159.0 | 151.2 | 143.4 |
Total steps | 304.6 | 295.8 | 300.4 | 270.8 |
Shifting movements | 40.0 | 43.0 | 42.8 | 38.6 |
Rotating movements | 7.2 | 6.8 | 7.2 | 5.6 |
Total movements | 47.2 | 49.8 | 50.0 | 44.2 |
Time ( |
Therefore, it is possible for them to decide to move different objects at the same time rather than aiming at the same target or placing the same object with different route plans, which will cost them extra steps to reach the common target or correct previous incorrect actions. That is, even with the explicit communication (talking) possible, it may be that it is only after incompatible moves that humans notice that they are following different plans.
The result in experiment 2 is quite
good compared with the other two experiments (see Table
One assistant robot helps human throughout.
Expterimental 2 | Horizontal rows | Vertical rows | T-shape | U-shape |
---|---|---|---|---|
Foreman (human) | 112.4 | 110.6 | 113.6 | 108.4 |
Robot A | 153.6 | 141.4 | 156.4 | 143.2 |
Total steps | 266.0 | 252.0 | 270.0 | 251.6 |
Shifting movements | 39.2 | 40.6 | 41.0 | 36.8 |
Rotating movements | 6.8 | 4.8 | 4.8 | 4.8 |
Total movements | 46.0 | 45.4 | 45.8 | 41.6 |
Time ( |
Since the robot with the codebook could infer the human-controlled foreman robot’s action by observation and cooperate with it, it is not necessary for the player to communicate with the other robot directly, which is different from the situation in experiment 1.
So the players can make their own decision without any other disturbance, which may be what leads to an improvement in all the costs, including robots steps, object movements, and time.
Apart from the second test case (vertical
rows), the robots in experiment 3 made the most object movements in the rest of
the test cases (see Table
Human and assistant start task, robots finish.
Expterimental 3 | Horizontal rows | Vertical rows | T-shape | U-shape |
---|---|---|---|---|
Foreman (human) | 28.6 | 26.8 | 29.0 | 24.4 |
Robot A | 115.6 | 103.8 | 118.2 | 106.8 |
Robot B | 143.4 | 142.8 | 150.0 | 132.0 |
Total steps | 287.6 | 273.4 | 297.2 | 263.2 |
Shifting movements | 42.0 | 40.2 | 43.6 | 41.8 |
Rotating movements | 7.4 | 4.8 | 7.2 | 6.0 |
Total movements | 49.4 | 45.0 | 50.8 | 47.8 |
Time ( |
In most of the test cases, the total steps made in experiment 3 are more than experiment 2 but still better than robots totally-controlled by humans. This is of course the key benefit of our work, to be able to complete the task, and faster than 2 humans is an excellent result.
We now continue and report the results of our experiments using eye gaze, with two further experiments. We immediately report the results below.
The results in experiment 4 show that on average, the
tasks were completed some 14% faster than in experiment 3 (see Table
Eye-gaze indication of object and target.
Expterimental 4 | Horizontal rows | Vertical rows | T-Shape | U-Shape |
---|---|---|---|---|
Foreman (human) | 29.6 | 26.8 | 25.2 | 28.0 |
Robot A | 89.6 | 89.6 | 106.4 | 92.6 |
Robot B | 131.8 | 118.0 | 132.4 | 125.4 |
Total steps | 251.0 | 234.4 | 264.0 | 246.0 |
Shifting movements | 41.2 | 40.4 | 43.6 | 41.4 |
Rotating movements | 6.4 | 4.0 | 6.0 | 6.0 |
Total movements | 47.6 | 44.4 | 49.6 | 47.4 |
Time ( |
This reduction is due to a helper robot starting to move to the correct object at the same time as the foreman and due to time saved during rotation.
Since the robot can only infer human actions, once a sequence of shifts is complete and a rotation is needed, the human-controlled foreman robot stops pushing the object and just waits.
Then, the assistant robot normally takes some time to move around the object once it infers the shifting task is complete. While some of our users noticed some boredom waiting at these stages, our results from experiment 2 show that the overall task was still completed quickly.
Here, the robot “knew” (can infer this from identification of the object and target, and constantly updates its path planning) the target so could immediately move to the correct spot on the object to make the required rotation.
In experiment 4, after the first object was in place, the assistant robots completed the task, achieving results comparable to those in experiment 2.
In this experiment, the eye gaze was used to initialise
the initial possibility values of the targets, and the 2 assistant robots
started and completed the task, without the user moving the foreman
(see Table
Eye gaze used to initialise task goal.
Expterimental 5 | Horizontal rows | Vertical rows | T-shape | U-shape |
---|---|---|---|---|
Robot A | 132.2 | 115.4 | 131.4 | 124.0 |
Robot B | 143.2 | 123.6 | 135.0 | 130.2 |
Total steps | 275.4 | 239.0 | 266.4 | 254.2 |
Shifting movements | 43.0 | 41.0 | 44.0 | 42.0 |
Rotating movements | 8.0 | 4.0 | 6.0 | 6.0 |
Total movements | 51.0 | 45.0 | 50.0 | 48.0 |
Time ( |
The results are 5% worse than experiment 4, but with no guidance via the foreman robot, all inference was via the eye gaze.
Our experiments have demonstrated that with suitable AI techniques (our fuzzy signatures), it is possible for robots (agents) to correctly infer human actions in a game-like interactive and cooperative task.
We have further shown that we can extend the notion of inference from the actions of the human to inference from their eye gaze. Such work is the first steps towards computing devices which understand what we want in ways similar to other human beings.
In our experiments, we had a clear indication of the decision point due to the way we structured the recording of eye-gaze information. In practice, while extracting the decision point may be possible from the eye gaze, we propose an alternative, a control device which the user could use to see a display of their inferred intentions, as well as a “do it” button. Of course, it is always possible that some will chose the “do it” immediately.
There seem to be two benefits of such a generic “show me and do it” control device. Firstly, to return the visibility of control to the user. That is, while the user is always in control, we want that to be explicitly visible to the user. This seems to us particularly significant for immersive computer games.
Secondly, human beings are often multitasking. In a general, setting beyond our experiments it is likely that users would receive phone calls and make eye gaze gestures with their eyes while talking—and would not like such actions to control their game or editing task. Hence, the need for a control which says “take note of my recent eye gaze behaviour and act appropriately.”
So, how useful will eye gaze be for computer games? In the areas, we identified avatar realism has some advantages, but the difference between how humans respond to other human versus avatars may limit the degree to which such agents can enhance games. In the area of active game controls, there is some scope; however the need to remain close to “normal” uses of people’s eye gaze again limits their use (except perhaps in rehabilitation and “therapeutic” games). In many first-person shooter games, the action is always at the centre of the screen, so there seems little eye gaze can add, except perhaps in our fashion to infer user intentions and to rotate the world for the user automatically.
Finally, we conclude that our techniques have some benefit in an assistive fashion, and that eye-gaze technology can be used to further enhance the immersive quality of games, but that eye gaze is unlikely to lead to a qualitative change in the nature of computer games.
James Sheridan was invaluable in the collection of eye-gaze data for this experiment. Thank you very much James. The seeing machines eye-gaze hardware and faceLab software were purchased using an ANU MEC Grant.