Frequent Pattern Mining of Eye-Tracking Records Partitioned into Cognitive Chunks

Assuming that scenes would be visually scanned by chunking information, we partitioned fixation sequences of web page viewers into chunks using isolate gaze point(s) as the delimiter. Fixations were coded in terms of the segments in a 5 × 5 mesh imposed on the screen. The identified chunks were mostly short, consisting of one or two fixations. These were analyzed with respect to the withinand between-chunk distances in the overall records and the patterns (i.e., subsequences) frequently shared among the records. Although the two types of distances were both dominated by zeroand one-block shifts, the primacy of themodal shifts was less prominent between chunks than within them.The lower primacy was compensated by the longer shifts.The patterns frequently extracted at three threshold levels were mostly simple, consisting of one or two chunks.The patterns revealed interesting properties as to segment differentiation and the directionality of the attentional shifts.


Introduction
Eyes seldom stay completely still.They continually move even when one tries to fixate one's gaze on an object because of the tremors, drifts, and microsaccades that occur on a small scale [1].Hence, researchers need to infer a fixation from consecutive gaze points clustered in space [2].We may regard such a cluster of gaze points as a perceptual chunk, a familiar term in psychology after Miller [3] in referring to a practically meaningful unit of information processing.
During fixation, people closely scan a limited part of the scene they are interested in.They then quickly move their eyes to the next fixation area by saccade, which momentarily disrupts vision.However, it normally goes unnoticed thanks to our vision system that produces continuous transsaccades perception [4][5][6].It means that successive fixations constitute a higher order chunking over and above the primary chunking of gaze points.Put metaphorically, the ⟨gaze, fixation, fixation-chunking⟩ relationship is analogous to the ⟨letter, word, phrase⟩ relationship.For the sake of brevity, a chunk of fixations will be referred to as a chunk.
In viewing natural scenes or displays, a chunk continues to grow until interrupted by one or more isolate gaze points resulting from drifting attention or by accident.These do not participate in any fixation.Whatever causes the interruption, we believe that such isolate points serve as chunk delimiters, like the pauses in speech.As a pause can be either short or long, interruptions by isolate points can vary in length.Figure 1 illustrates two levels of chunking: (a) chunking of gaze points into fixations and (b) chunking of consecutive fixations with and without interruption.
Granting our conjecture, one may still wonder what particular merits will accrue from the analysis of chunks in lieu of ordinary plain fixation sequences.The expected merits are twofold: separation of between-and within-chunk patterns and extraction of common patterns across records.Neither of these is attainable when dealing with multiple records by heat maps of fixations accumulated with no regard to sequential connections [7], by network analysis of the adjacent transitions accumulated within and between records [8][9][10], or by scan paths that would be too complicated [11] unless reduced to frequently shared subpaths.The key to understanding this point lies in the structure of fixation sequences as explained below.
1.1.Structure of Fixation Sequences.Equation (1) presents two types of fixation sequences, one plain and the other partitioned, both arranged in time sequence.The former serves as a basis for heat maps, scan paths, and network analysis.The latter incorporates chunks delimited by isolate gazes or any other appropriate criterion.The essential nature of the sequences remains the same when fixations are coded in areas of interest (AOI) or grid-like segments.

Plain and Partitioned Fixation Sequences. Consider
Plain: where   denotes the th fixation.Although it was not explicitly stated, McCarthy et al. [12] in effect extracted chunks from partitioned sequences in their work on the importance of web page objects and their locations.They grouped consecutive fixations within each AOI into a chunk called a glance to obtain plain sequences of glances coded in AOI.Their interest was to see how often areas of web pages would attract glances by varying the area locations and the types of tasks.
By focusing on the frequency of glances as an indication of importance, they disregarded the length of the chunks, that is, the number of fixations within glances.Also disregarded was the shift of glances, that is, between-chunk sequences.To us, both within-and between-chunk patterns seem to contain rich information worthy of investigation.The information can be extracted from partitioned sequences but not from plain ones.In addition, partitioned sequences will be of great value when some AOIs are nested into broader AOIs (see [16]), given the appropriate coding.The present study is extensible to such a hierarchical structure.
For the sake of simplicity, we will focus on the eye movements of web page viewers, and we will assume that the pages are divided into grid-like AOIs, that the fixations are coded in terms of the areas in which they fall, and that chunks are delimited by isolate gaze points.
shifted or did not shift in a looped transition that represents sustained interest in a given area.In our view, a chunk of fixations reflects continuous interest, and a new one begins after a momentary drift of the gaze.It seems natural to expect the distance distribution of the within-chunk shifts to differ, to some extent, from that of the between-chunk shifts.
The distance analysis explained above exploits information from the cumulative records across all viewers.Hence, it is possible that the results are influenced by some dominant patterns in particular records.If one is interested in sequential regularities often shared among records, frequent sequential pattern mining is useful, as explained below.

Frequent Sequential Pattern
Mining.Among others, we will employ PrefixSpan, developed by Pei et al. [13,14], because of its conceptual compatibility with the partitioned sequences of eye-tracking data.Their approach is briefly explained below using their example, seen in Table 1.(See the Appendix for a more formal explanation.)One can view the data as the eye-tracking records of four viewers in which fixations are alphabetically coded according to the areas of interest (AOI) they fall into: , , , , , , and .
Codes , , , , , and  are all frequent, shared by the majority, whereas code  is infrequent, appearing only once.For further scanning, any infrequent or rare code is to be removed from the records, since it will never appear in frequent patterns according to the a priori principle [15].Let us set the level of being frequent at three for illustrative purposes.This level is called the minimum support threshold (abbreviated as ms).
For every frequent code, one scans the reduced records, devoid of infrequent codes, for patterns prefixed by the given code.Those found for prefix "" are listed in the second column of Table 1.These are subject to further scanning with respect to the frequent codes at this step, that is,  and .This process recursively continues until no code is frequent or no patterns remain in the records.Note that prefixes grow in each step like "[][]", "[][]" in the above example (see the Appendix for a more formal explanation).
Table 2 Similarly, those found at ms3 are included in the patterns at ms2 reported by Pei et al. [13,14].Inclusive relations generally hold between different ms levels.
Ordinarily, one finds too few patterns at a high ms level and too many at a low level to make an interesting analysis.However, once one recognizes the inclusive relations, making use of multiple levels becomes a plausible solution for identifying strongly frequent patterns as opposed to mildly and weakly frequent ones.(See the Appendix for the relation networks among the patterns identified at ms2, ms3, and ms4.) The present approach is expected to advance eye-tracking research along with conventional heat maps, scan paths, and network analysis recently developed by Matsuda and Takeuchi [8][9][10].

Subjects (Ss).
Twenty residents (seven males and 13 females) living near the AIST Research Institute in Japan were recruited for the experiments.They had normal or corrected vision, and their ages ranged from 19 to 48 years (average 30).Ten of the Ss were university students, five were housewives, and the rest were part-time workers.Eleven Ss were heavy Internet users, while the rest were light users, as judged from their reports about the number of hours they spent browsing online in a week.

Stimuli.
The front (or top) pages of ten commercial web sites were selected from various business areas: airline companies, commerce and shopping, and banking.These were classifiable into three groups according to the layout types [8][9][10].Due to space limits, we chose four pages with the same layout, the top and the principal layers.The principal layers were divided into the main area in the middle and subareas on both sides.The layers and the areas differed in size among pages.

Apparatus and Procedure.
The stimuli were presented with 1024 × 768 pixel resolution on a TFT 17  display in a Tobii 1750 eye-tracking system at a rate of 50 Hz.The web pages were randomly displayed to the Ss one at a time for 20 sec.The Ss were asked to browse each page at their own pace.The translated instructions are "Various web pages will be shown on the computer display in turn.Please look at each page as you usually do until the screen darkens and then, click the mouse button when you are ready to proceed." The Ss were informed that the experiment would last for approximately five minutes.

Segment Coding.
A 5 × 5 mesh was superposed on the effective part of each page, after the page was stripped of white margins that had no text or graphics.A uniform mesh was employed for ease of comparison among pages that varied in design beyond the basic layout.The distance of a shift between two segments was measured by the Euclidean distance, computed as the square root of  2 +  2 , where  and  are the number of blocks (i.e., segments) moved along the horizontal and vertical axes.
The rows (and columns) of the mesh were alphabetically (and numerically) labeled in descending order: A through E (and 1 through 5).The segments were coded by combining these labels as seen in Figure 2: A1, A2, . . ., A5 for the first row; B1, . . ., B5 for the second; and so on through E1, . . ., E5 for the fifth row.

Fixation Sequences.
The raw tracking data for each subject consisted of time-stamped gaze points measured in coordinates.The gaze points were grouped into a fixation point if they stayed within a radius of 30 pixels for 100 msec.Otherwise, they remained isolate.
Each fixation was then translated into code sequences according to the segments in which the fixation fell.Finally, each fixation sequence was partitioned into chunks using the isolate gaze points as delimiters.

Preprocessing the Codes for PrefixSpan.
In accord with the algorithm, 25 segments were first recoded using letters  through ; then the codes in each chunk were alphabetically ordered with no duplication.In this process, we represented within-chunk loops by extra recoding.Consecutively repeated codes within a chunk were replaced by the corresponding capital letter, for example, [caaababaa] to [cAbabA].After eliminating duplicates, we sorted the codes within each chunk, for example, [Aabc] from the original sequence.Consequently, we maintained the sequential order among chunks, but the within-chunk sequences could have been distorted.Due to this possibility, we were unable to identify betweenchunk loops.
Frequent patterns were extracted at three levels of minimum support (denoted as ms12, ms14, and ms16) corresponding to 60, 70, and 80% of the subjects.

Results
The four pages used as stimuli will be referred to as P1, P2, P3, and P4.

Examination of the Chunks.
The total number of chunks did not greatly differ among pages, ranging from 539 (P2) to 592 (P1).The pages agreed well on the lengths and proportions of primary, secondary, and tertiary chunks that contained one, two, and three fixations, respectively.Primary chunks accounted for 53.3 (P4) to 60.4% (P1) of the total chunks, and secondary chunks accounted for 21.9 (P1) and 25.1% (P4).Putting the primary and secondary chunks together, the vast majority of the chunks (≥78.4%) were very short.The proportions of the tertiary chunks were much smaller, ranging from 6.9 (P3) to 12.2% (P4).The longer chunks accounted for 7.9 (P1) to 11.6% (P3).
Loops and one-block shifts were also dominant among the chunks of length three or more.Loops accounted for 49.2 (P4) to 60.8% (P3) of the shifts, and one-block shifts accounted for 34.4 (P3) to 42.7% (P1) of them.Putting these together, the overwhelming majority (≥91.0%) of the shifts within longer chunks were extremely short in distance.

Examination of the Frequent Patterns.
The frequent patterns extracted at three different ms levels (ms12, ms14, and ms16) are inclusive within each page in the sense that (a) subpatterns of a frequent pattern are also frequent at a given level and (b) the patterns extracted at a higher level are included in those at a lower level.For the sake of simplicity, the term "frequent" will be omitted below when obvious.Prior to mining, special coding was applied to the withinchunk loops as explained in Section 2.
In the six patterns of length three found on P2 and P3 at ms12, the constituent codes were partially or totally homogenous.Five of them contained two repeated codes, either A2 or B3, including those prefixed by (A1..) as reported above.The remaining one, found solely on P3, contained A2.In the following examination of the double-chunk patterns, loops will be treated as single codes to reduce complexity.
The double-chunk patterns are listed in Table 4 by the direction of the sequences-upward, homogenous, horizontal, and downward.Superscripts L and R denote leftward and rightward sequences.Underscored patterns were extracted at ms14 and above.Those found only at ms16 are further emphasized in italicized bold face.The total number of patterns varied from 13 (on P2 and P4) to 34 (P3).
The patterns extracted at ms14 and above had no segments in rows D and E and no segments in the fifth column.None of the seven upward and downward sequences were strictly vertical, involving adjacent or nonadjacent columns in the ratio of 4 to 3.These vertical patterns mostly involved adjacent rows (6 out of 7).Some of the constituent segments of the sequences at ms14 and above appeared solely as prefixes (A1 on P1 and P3; A2 on P4) or as postfixes (B3 on P1; B1 and C2 on P2; A3, B3, B4, and C3 on P3; B4 and C4 on P4).
The new double-chunk patterns found at ms12 had (a) segments in row D and in column 5, (b) notable positions of the new segments, (c) increased heterogeneous patterns, (d) increased sequences between nonadjacent rows, (e) strictly vertical sequences, and (f) bilateral sequence pairs.The segments in row D appeared only as postfixes in the downward sequences (D2 and D3 on P1 and P2; D2, D3, and D5 on P3; and D3 on P4).Similarly, the new segments found in row C were postfixes (C3 and C4 on P1; C3 on P2; C1 on P3; and C2 on P4) with a single exception (C2 on P3).The new segments in row B were mostly postfixes: B1, B4, and B5 on P1, B5 on P3, and B4 and B5 on P4.B2 and B3 on P4 were prefixes.An interesting case was B2 on P2 which was special, being a prefix to itself (B2B2).Dual roles were more notable than unary ones among the new segments in row A (A2 and A3 on P1, A1 on P2, and A3 on P4).
A total of seven new upward sequences were found, three on P1 and two on both P2 and P3, but still none on P4.These were prefixed by B2 (on P1 and P3), B3 (P2), or C3 (P3) and postfixed by the segments in row A-A1, A2, A3, or A4.Only Note.The sequence directions are upward (↑), homogenous (==), horizontal (↔), and downward (↓).Underscored patterns were extracted at ms14.Those extracted at 16 are also emphasized in italicized bold face.Leftward and rightward sequences are marked by superscripts L and R , respectively.
C3A2 involved nonadjacent rows.A strictly vertical sequence was present on each of P1, P2, and P3-B2A2, B3A3, and B2A2.The rest were rightward (B2A3 and B2A4 on P1) or leftward on P2 and P3 (B3A1 on P2; C3A2 on P3).A total of five new homogenous sequences were found on P2 and P3, one in row A (A1A1 on P2), three in row B (B1B1 on P2 and B2B2 on P2 and P3), and one in row C (C3C3 on P3).Like those at ms14 and above, none of the constituents were in columns 4 or 5.
A total of 17 new horizontal sequences were found on P1 (two in row A and four in row B), P2 (two in B), P3 (one in A, three in B, and one in C), and P4 (one in A, two in B, and one in C).A2 and A3 appeared as a prefix or as a postfix, while A4 appeared only as a postfix.The same held for B1, B2, and B3, while B4 and B5 appeared only as postfixes.C2 assumed dual positions in C2C1 on P3 and C3C2 on P4, both of which were leftward.The ratio of leftward to rightward sequences was 2 : 4, 1 : 1, 3 : 2, and 1 : 3 in the order of P1, P2, P3, and P4.
A total of 29 new downward sequences were found, six on P1, three on P2, 14 on P3, and six on P4.The prefixes concentrated in rows A and B with two exceptions (C3D2 on P3 and C3D3 on P4).In contrast, the postfixes concentrated in rows C and D with exceptions of five patterns on P3 and one on P4.Half or more of the downward patterns on P1, P2, and Note.Primitives in bold face were persistent at two or three ms levels.Among all of the patterns in Table 4, the heterogeneous sequences were mostly unilateral in that the symmetric pairs were limited in number (B2B3-B3B2 on P1; B1B3-B3B1 on P2; A2A3-A3A2, A2B2-B2A2, A2C3-C3A2, and B2B3-B3B2 on P3; and none on P4).Four of these were horizontal sequences.The constituents were limited to a subset consisting of the first three rows and columns, that is, {A2, B1, B2, B3, and C3}.
The individual constituents of the multichunk patterns were frequent by themselves as primitive patterns at a given ms level, but not vice versa.Table 5 lists the isolate primitive patterns not participating in any multichunk pattern at a given ms level.While the number of total primitive patterns monotonically decreased from ms12 to ms16, the ratio of the isolate primitive patterns to the total primitive patterns monotonically increased on all pages almost perfectly.The ratios at ⟨ms12, 14, 16⟩ were ⟨4/17, 7/11, 4/5⟩, ⟨4/13, 5/8, 2/4⟩, ⟨0/13, 4/11, 3/6⟩, and ⟨5/17, 9/14, 4/6⟩, in the order of P1, P2, P3, and P4.The sole exception was the second and the third ratios on P2.There were no isolates on P3 at ms12.
Generally, an isolate primitive at a given ms level would become a member of sequence(s) at a lower level and would not be present at a higher level.Exceptionally, C5, located in the rightmost column, persisted on P4 as an isolate at all ms levels.Partial persistence was observed between ms14 and ms16 on P1 (A2), P2 (A1, A3), and P4 (B3) as well as between ms12 and ms14 on P4 (B5, C1).No persistence was observed on P3.The persistent ones on P1 and P4 were limited to the first three columns of the top row, {A1, A2, A3}, whereas those on P4 spread over rows B and C in columns 1, 3, and 5, that is, {B3, B5, C1, C5}.
Finally, E3 on P1 at ms12 was the sole frequent segment in the bottom row E where segments were generally infrequent across pages at all ms levels.

Discussion
Eye-tracking researchers have inferred a fixation from gaze points closely clustered in space and time, treating it as a meaningful unit of information processing, that is, a chunk, a familiar concept in psychology.Chunking of lower-level chunks into a higher one is not uncommon as seen in the relationships ⟨letter, word, phrase, sentence, paragraph, . ..⟩.The present paper examined the patterns of second-order chunks, that is, chunks of fixations, using isolate gaze point(s) not participating in any fixation as the delimiter.The delimiter was assumed to play an auxiliary role in chunking, like a pause in speech.
Most of the identified chunks were short, consisting of one or two fixations.Also, the transitions within multifixation chunks and between chunks were mostly short in distance, either loops or one-block shifts to adjacent segments.These seem to be attributable to the minimum criterion of the delimiter we employed-at least one isolate gaze point.Hence, even an accidental dislocation of one's gaze resulted in chunking.It would be ideal if we could separate cognitively meaningful chunking from accidental chunking.Until an effective method is established, the best we can do is to be cautious in interpreting the results.
Actually, setting an appropriate criterion is a difficult task due to the possible individual and situational variations.Perhaps individuated criteria will be appropriate instead of a uniform criterion.Further investigation of the distributions of gaze points participating in fixations and those that are isolated is necessary.
As reported earlier, within-and between-chunk transitions were similar in that the first two modal distances were zero (i.e., loops) and one block.However, these differed in order and in magnitude.Loops were primary among withinchunk transitions but secondary among between-chunk transitions.The opposite was true for the one-block shifts.Next, the proportions of the primary and secondary distances of the within-chunk transitions exceeded the respective proportions pertaining to the between-chunk transitions.Similarly, there were more long-distance shifts between chunks than within them.
These results seem to suggest that the attention of our subjects was most likely shifted, after a pause, to an adjacent segment one block away or within the same segment.The medium or long-distance shifts were also separated by pauses, though their proportions were smaller than the short ones.Shifts without a pause, that is, within-chunk shifts, were short, chiefly occurring in the same segment or between adjacent segments one block away.Now we turn to a discussion of the frequent patterns (i.e., subsequences) extracted by PrefixSpan.The patterns were simple in structure, mostly consisting of single or double chunks.Furthermore, the chunks themselves contained single fixations or single loops as expected from the chunk properties discussed above.More complex structures might have resulted if we had employed less stringent criteria for the delimiter.Even so, beneath the structural simplicity, interesting properties emerged as to the segment differentiation and the directional unevenness in attentional shifts.
First, the within-chunk loops were limited to (A1..), (B1..), and (D1..), all of which were in the leftmost column.While the presence of (D1..) was quite limited, the leading roles of (A1..) and (B1..) as prefixes in the multichunk sequences are noteworthy.These roles might be attributable to menu items placed in the segments.Second, the multichunk sequences chiefly consisted of the segments in rows A, B, and C. In particular, the leading role of A1 on P1 and P3 was noteworthy, like the loop (A1..), though its dual role as pre-and postfix was observed on P2.In contrast, A4, B4, and C4 were consistently positioned as postfixes.The same held for the segments in row D, which appeared only at the lowest ms level.The segments in row E were totally absent in multichunk sequences.
Third, the sequences at ms14 and ms16 were more likely to be horizontal, including homogenous codes, than downward and, to much less extent, than the upward sequence, which remained least likely among the additional patterns found at ms12.The order between horizontal and downward sequences varied across pages at ms12.
By chunking eye-tracking records into smaller units, we discovered interesting properties of the eye movement of web page viewers.However, further studies seem necessary to enhance the present approach, for example, by setting up nested AOI's to reflect the hierarchical structure of the web objects [16] and by adjusting the chunk delimiters to accommodate individual and task variations.Besides these refinements, we are planning an application of mined frequent patterns to simultaneous clustering [17] of subjects and the properties of their eye movement and other relevant indices.

Appendix
We briefly explain frequent sequential pattern mining by Pre-fixSpan (prefix-projected sequential pattern mining) developed by Pei et al. [13,14].Interested readers should consult the original articles for formal descriptions and evaluations in comparison with other competing algorithms.
Let us use Table 1 as the DB (database) to be scanned.It consists of four sequences whose elements are nonempty subsets of items {, , , , , , }.An element is composed of a set of items: , , , , , , and .PrefixSpan assumes that items in an element are alphabetically ordered with no duplication, for example, [], [], and [].
The goal of PrefixSpan is to find subsequences frequently shared among the records in DB.A subsequence is defined as the list of nonempty subsets of the elements of a given sequence, where the sequential order of elements is preserved.For example, The threshold of frequent occurrence is called the minimum support (abbreviated as ms in this paper).Its value is to be specified by the user.
Subsequences of special importance are a prefix and the associated suffix.For instance, a frequent item , with ms = 3, can serve as a prefix of the ensuing pattern (i.e., the suffix) to be scanned next.The patterns listed in the second column of Table 1 are the suffix sequences constituting the ⟨⟩projected database.Similar databases are to be constructed for every frequent item.With ms2,  and  together will be considered frequent, where the underline implies .Hence,  will serve as a prefix, yielding only the two suffixes  Figure 3: Network of the frequent patterns extracted at ms2 (small letters in dark blue), ms3 (large letters in dark red), and ms4 (underscored).Note: [] is omitted for the single-code chunks and is replaced by () for the multiple-code chunks for the sake of simplicity.See the first column of Table 1 for the initial sequences.
The network of the frequent patterns extracted at ms = 2, 3, and 4 is illustrated in Figure 3 to help grasp the inclusive relations among them in two senses: (a) nn element of a frequent pattern is also frequent; and (b) a frequent pattern at a given ms level is also frequent at a lower level.
More formally, a sequence  of length  is a prefix of another sequence  of length  ( ≤ ) consisting of frequent elements in the database if and only if the first  − 1 elements are identical; the last element of  is a subset of the th element of .
The suffix of  with regard to  is a sequence, the first element of which is the difference between the th elements of  and .The remaining elements of the suffix are identical with the ( + 1)th to the last element of ; that is, Scanning with respect to the prefix stops when the suffix becomes nil ( = ) or no frequent item exists in the projected database.This process is executed in a depth-first manner for every code initially identified as frequent.
It must be noted that some of the extracted patterns may be hard to identify in the original sequences, due to the intermittent removal of infrequent items from the projected database during the process, for example, the extracted pat-

Figure 1 :
Figure 1: Two fixations in one chunk (a) and in separate chunks (b).

P3
involved nonadjacent rows (A-D/1 and B-D/3 on P1; B-D/2 on P2; and A-C/1, A-D/5, and B-D/2 on P3, where  denotes the number of cases), whereas only A2C4 out of six patterns did so on P4.The strictly vertical patterns were limited to columns 2 and 3 (B-D/2 on P1; B-C/1 and B-D/1 on P2; A-B/1, A-D/1, and B-D/1 on P3; and B-C/1 and C-D/ on P4).The rest were rightward on P1 and P4, leftward on P2, or mixed on P3.

Table 2 :
All of the frequent patterns extracted from Table 1 at ms3.

Table 3 :
Number of patterns () by length (len) by ms.Note.The length of a pattern (len) is the number of constituent chunks.Also listed are the identified within-chunk loops with the number of patterns in which they appeared.

Table 4 :
Double-chunk patterns by direction.

Table 5 :
Isolate primitives by ms level.