Tourist Behavior Pattern Mining Model Based on Context

Personalized travel experience and service of tourist has been a hot topic research in the tourism service supply chain. In this paper, we take the context into consideration and propose an analyzed method to the tourist based on the context: firstly, we analyze the context which influences the tourist behavior patterns, select the main context factors, and construct the tourist behavior pattern model based on it; then, we calculate the interest degree of the tourist behavior pattern and mine out the rules with high interest degree with the association rule algorithm; we can make some recommendations to the tourist with better personalized travelling experience and services. At last, we make an experiment to show the feasibility and effectiveness of our method.


Introduction
With the development of economy and the improvement of people's living standard, more and more people pay more attention to the quality of personalized travelling experience and service.In recent years, there has emerged more and more personalized ways to travel in tourism, such as FIT travel and independent travel.The traditional mode of travel service limits the diversity of service options, and it cannot fully meet the personalized needs of tourists.How to find the laws and the features of the tourist behavior through mining tourist behavior patterns and offer them better services has been a problem in the tourism service supply chain.
There are many researches concentrating on the tourist behavior pattern.Qing analyzed the characteristics of tourism services and the structural properties, constituent elements, and operation mechanism of tourism service supply chain in the context of modern information technology, and he put forward a new tourism service supply chain conceptual model based on tourist personalized demand [1].Farmaki took the Troodos (Cyprus) as a case to research on the tourist motivation [2]; Martin and Witt proposed tourism demand forecasting model to represent tourists' cost of living [3]; Smallman and Moore studied on the tourists' decision making [4]; Kim et al. has worked on the Japanese tourists' shopping preference with the decision tree analysis method [5].
These studies only analyzed the tourist from the view point of the psychology and behavioral science of the tourist and have not considered the context set which will influence the tourist behavior patterns.So, in this paper, we take the context into consideration and propose an analyzed method to the tourist based on context to find out the relationship between services in the travel and the context and analyse the important contexts which will influence the tourist behavior.To mine out rules with high interest degree with the association rule algorithm and do some recommendations to the tourist with better personalized travelling experience and services, we propose a method based on network diagram, and it can reflect the relationship of the contexts which influence the tourist behaviour clearly.Through this method, we can delete the low interest degree of tourist behavior patterns; then, we use the Apriori algorithm to mine the association rules of tourist behavior which have high interest degree.Finally, we take an experiment to show the feasibility and effectiveness of our method.

Context.
There are many definitions on the context and many researchers work on it.Schilit et al. defined the context as identifications and change of location, people, and objects around them [6].Brown et al. thought that the context should be defined as the symbols around people or other objects such as location, time, season, temperature, and so on [7].In paper [8], the definition of context would be extended to the feature information of some objects' situation, such as people, location, and so on.Snowdon and Grasso defined the context as the multilevel structure, mainly including the individual layer, the project layer, the group layer, and the organization layer [9]; Gu thought that the context would respond to the transformation based on the computers which are used as the centers to the people; in fact, he defined the context as a spectrum in his paper, as shown in Figure 1.He divided the context into computing context (such as communication bandwidth), user context (such as location), physical context (such as weather, temperature), time context (such as hour), and social context (such as law) [10].
In this paper, we think that the context is the influence factors of the tourist behavior pattern; different contexts will lead the tourist to different behavior patterns.We may take the following contexts into consideration: user, location, time, and device, and service type.

Association Rule and Apriori
Algorithm.There are many association rule algorithms, and these algorithms can be divided into two classes: the first one is mainly focused on improving the analytical efficiency of the association rules; the other one pays more attention to the application of association rule algorithm and how to deal with value type variables and promotes the association of the single concept layer to multiple concept layers include and further reveals the inner structure of objects.
Apriori algorithm is one of the classical association rule algorithms; the earliest Apriori algorithm was proposed by Agrawal et al. [11].The algorithm mainly including two parts: producing frequent item sets and producing association rules according to the frequent item sets.The algorithm scans data base, accumulates each item count, collects the items which meet the minimum support (min sup), finds out the frequent 1-itemsets, and named it  1 .Then, the algorithm uses  1 to find out the frequent 2-item sets  2 and uses  2 to find out the frequent 2-item sets  3 and so on and keeps doing these until it cannot find out the frequent -item sets.In these frequent item sets, it will be defined as a strongassociation rule if it reaches the minimum confidence [12].Since the association rule algorithm was proposed, it has been improved and applied in many fields.For example, Kang et al. applied the association rule algorithm in the Smart home [13], and Zhang et al. used the improved association rule algorithm in the university teaching managements [14].

Modeling and Mining Method for Tourist
Behavior Pattern Based on Context At the same time, we take different capabilities of the mobile devices that the customer use, such as screen size, battery durability, and access bandwidth, into consideration.We consider that these capabilities will influence the mobile customer behavior pattern directly or indirectly.To prove that, we make an experiment as as follows.In the particular context, we observed behavior patterns of three customers who used different equipments and recorded the service types, the trajectory at which they moved, and time and type of service.Finally we got the customer movement trajectories as shown in Figure 2 and the service request information table as shown in Table 1.We can conclude from Figure 2 that customers have different behavior patterns when they use different mobile devices.For example, when the user  1 used the device  1 , his movement trajectory was  2 →  6 →  8 →  9 ; when he used the device  2 , his movement trajectory changed to  2 →  6 →  9 .Then, we can conclude form  Table 1 that the customer requested different services when he used different devices in the same time or requested the same service in different times.For example, when the user  1 used device  1 at time  3 ; he requested the service  3 ; when he used device  2 at time  3 , he requested the service  4 ; the user  3 requested the service  2 at the location  4 when he used device  1 ; he requested the service  2 at the location  3 when he used device  2 .Through these analyses, we can conclude that the mobile customer has different movement trajectories, request different services at the same times and requests the same service in different places when he or she uses different devices.So we take the mobile device as a context influence factor of mobile customer behavior pattern.
There are other context factors which influence the mobile customer behavior pattern, such as the physically environmental condition in which the customer stays, including weather, temperature, humidity, and so on; and the social situations in which the customer is involved (e.g., manners and customs and laws) will influence the mobile customer behavior pattern.
We use the form of the questionnaire to determine the main context factors.In this questionnaire, we design nine questions.Each of the nine questions involves a context factor which will influence the tourist behavior pattern.From these questions, we can study which contexts will influence the tourist behavior pattern most.A total of 102 individuals participate in the survey; they are all tourists.After stating these questionnaires, we use SPSS to analyze the results.We set that different option to different weight (1-5), and then statistically averaging, what are the context weights influence the behavior.We can get the results as shown in Figure 3.So in this paper, we choose the following five context factors as the main context factors: tourist (user), device, location, time, and service.

Modeling the Tourist Behavior Pattern Based on Context.
The preceding part of this paper has a brief analysis on the context factors which influence the tourist behavior pattern, and then we will build a model based on these context factors.In the following part of this paper, we will give relational definitions about the tourist behavior patterns firstly and construct a model of the tourist behavior pattern based on context latterly.
Definition 1 (tourist user). = { 1 ,  2 ,  3 , . . .,   } is the set of all the users; every user denotes a person who uses the mobile device to request mobile service messages from the mobile service supplier when he or she was travelling.
Definition 2 (devices of the tourist use).The device of the user use is a set of the devices of the user use to request mobile services and can be defined as  = { 1 ,  2 ,  3 , . . .,  ℎ }.According to the previous definitions, this paper assumes  = {, , , ,   , ,   } as one tourist behavior, where  is an element of the tourist user set ,  is an element of the device of the user use set ,  is an element of the time set ,  is an element of the location set ,   is the time in which the tourist sojourns at location ,  denotes an element of service messages set , and   denotes the time in which the tourist requests for tourism services.
In the graph theory, there is a structure called network whose structure is composed of nodes and edges.Every edge has its quantitative index related to the nodes or edges; this quantitative index is normally called weight which could denote distance, expenses, carrying capacity, and so on [18].Namely, the structure of the network is composed of nodes and edges involving weight; taking this advantage of the network, this paper makes the context factors which influence the tourist behavior pattern as the nodes of the network, the connected relationship among the context factors as the edge of the network, and the connect coefficient among different context factors as the weight of the edge (the specific connect relationship and the connect coefficient will be demonstrated in detail in the following part of this paper).Like this, the behavior pattern of a tourist can be clearly portrayed in the network.Figure 4

Tourist Behavior Pattern Mining Method Based on the
Network.The preceding part of this paper has a model analysis on the structure of the network of the tourist behavior; in the following part of this paper, we will give out the related definitions firstly and the specific procedures of the tourist behavior mining pattern based on the network latterly.

Basic Definitions.
To explain the content of the mining method more clearly, we will give relational definitions firstly.
Definition 6 (connect coefficient).Connect coefficient denotes the connection relationship between two different attributes; the specific connect coefficients are  ⋈ ,  ⋈ ,  ⋈ , and  ⋈ .The connect coefficient of  ⋈  denotes the connection times between a mobile user  and a device .The connect coefficient of  ⋈  is   which denotes the connection times of a device  with a location .The connect coefficient of  ⋈  is   which denotes the time in which a user sojourns at location .The connect coefficient of  ⋈  is   which denotes the time in which a mobile user requests for services.

Definition 7 (interesting locations and interesting services).
When the length of time in which a tourist sojourns somewhere is larger than the threshold value we set, we think that the tourist is interested in this place.Similarly, when the length of time in which a mobile user requests for a service   is larger than the threshold value or the connection times is larger than a threshold value, we think that the mobile user is interested in this service.Usually the length of time will be set up to 30 minutes and the connection times will be set up to 10 times.
where n denotes the amount of devices.Similarly, the connect edge value of  ⋈  is as follows: where  denotes the amount of locations and   () and ∑  =1   () denote the connect times between devices and locations in the same behavior pattern of a mobile user.

The Connect Edge Value of 𝐿 ⋈ 𝑇
where   (ℎ) and ∑ 24 ℎ=1   (ℎ) denote the time in which a mobile user requests services at somewhere in his behavior pattern.

The Connect Edge Value of 𝑇 ⋈ 𝑆
where  denotes the amount of the connect service set and   (ℎ) and ∑  =1   (ℎ) denote the time in which a mobile user requests for services in his behavior pattern.An edge will be deleted if its connection edge value is smaller than a threshold value.A behavior pattern will not be involved in the calculation of the connect edge value if it contains interesting locations or interesting services.
Definition 10 (connect edge coefficient ).When a repeated edge appears, this edge value constitutes of several behavior patterns; connection edge coefficient  denotes the incidence a behavior pattern has on this edge.Its value equals ratio of the connect edge coefficient of this behavior pattern to the sum of all the connect edge coefficients of the same mobile user at this edge.
Definition 11 (interesting degree id).Interesting degree id is an index to reflect the degrees of interests of the mobile user behavior pattern.Specifically, it equals the value that the sum of all the tuple (, , , , ) weight, the formula of interesting degree id =   *   +   *   +  ℎ *  ℎ +  ℎ *  ℎ .If the value of interesting degree id is smaller than a threshold value th 1 , we will regard the degree of interests of this mobile user pattern as low interest level and delete this pattern from We can set them by our need; the larger the value is, the higher degree of interest the rules of the results will have.As is illustrated in Figure 5, we can use it as a behavior prediction model to predict the behavior pattern of a mobile user in the future.If a behavior pattern contains interesting locations or interesting services, we will regard it as the high interesting level behavior pattern without calculating the specific value of its interestingness.

Mining Steps First Step (collecting data).
To mine tourist behavior pattern, we must collect data about the tourist.We can get the information table as is shown in Table 3   times.For example, there are two situations when the device  1 collects the location  3 : one is 4 and the other is 2, so the connection coefficient of  1  3 equals 4 + 2 = 6.

Forth
Step.Considering different customers have different behavior patterns, we classify each user into a group and calculate the collection weight according to Definition 8; when the collection weight is lesser than the threshold, the edge will be deleted.

Fifth
Step.Calculating the remaining customer interest degree according to Definition 11 and set the low interest degree th 1 and the high interest degree th 2 value.When the customer interest degree is lesser than the low interest degree th 1 , this customer behavior pattern will be unconcerned, and the general interest degree and the high interest degree pattern will be conducted in the next step.

Sixth
Step.Using Apriori algorithm to mine the frequent pattern to the general interest degree and the high interest degree pattern, mine out the association rules with higher degree value on support and confidence; we can use these rules to forecast the customer' behaviors in future or recommend some services to mobile customers.
In order to show the availability of our method, we propose the concept of "coverage, " which means the ratio of the number of the same rules that are produced by our model to the number of rules that produced directly.If the coverage is larger than a threshold, we say that the method we proposed is available.Generally, the larger the threshold is, the more the availability of the method is.In this paper, we set the threshold to be equal to 80%.

Example and Analysis.
We take the West Lake of Hangzhou, for example, to illustrate the application of the model, via GPS and RFID provide personalized services to users combined with requirements and preference of the user.So we select part of the information data about tourist behavior from West Lake of Hangzhou Scenic Area Management Committee as is shown in Table 4.
To verify the effects of the proposed method, we use two standard metrics: interest degree and coverage.We can conclude from Table 4 that the patterns  7 ,  12 , and  18 are the patterns with interested locations or interested services, and we think that these patterns are the behavior patterns with high interest degree patterns.To show the processes of our method, we choose three users' patterns in this paper and design a network diagram with five layers and collect the adjacent layers and then calculate the connection coefficients according to Definition 7, as shown in Figure 7.
Then, we divide each user into a group and calculate the collection weight according to Definition 8; we set that the edges whose weight is lesser than 0.2 would be deleted, so the edges So remain following patterns: In this paper, we set the low interesting degree th 1 value to be equal to 0.8 and the high interesting degree th 2 value We delete the low interesting degree patterns and get the patterns with high interesting degree as is shown in Table 5.Then, we use the Apriori algorithm to mine rules on the high interesting degree patterns; we set the minimum support to 20% and the minimum confidence to 80%, then we can get the results as follows.
The lift denotes the ratio of the confidence to the support of the consequent item; the computational formula is followed:   →  =   →  /  .The lift reacts the influence degree of the antecedent item  to the consequent item  appears.Generally, the lift value should be larger than 1, and it means that the antecedent item  has a positive influence on the consequent item  appears.The larger the life value is, the better the rule is.
From Table 6, we can conclude that we can get 39 association rules when we use the method we proposed in this paper.These rules were obtained from the high interesting pattern; we thought that these rules were interesting rules.Then we observe the rule with the maximum lift, time =  17 , and service =  3 → location =  3 .The value is 4.5.It means that this association rule has the highest realistic guidance.So this rule will be firstly considered when we use the rules of the result.We can use these association rules to recommend some services to tourist to offer them better services; for example, using the rule location = 2 and time =  16 → service =  4 , we can recommend the  4 to the tourist when the tourist stays in the context with location =  2 and time =  16 .In this paper, the service  4 is the tourism route guide, so we can send the tourism route guide to the tourist as is shown in Figure 11.

Comparison and Discussion.
To verity the effects of the method we proposed in this paper, we use the Apriori    algorithm, the GRI algorithm, the CARAMA algorithm and Predictive-Apriori algorithm on the original data (here we set the minimum support equals to 20% and the minimum confidence equals to 80%; too), and we get following rules as is shown in Tables 7, 8, 9, and 10.
Comparing Table 6 with Tables 7 and 8, there are 11 rules from Table 7 which have been emerged in Table 6 (the rules marked with yellow as is shown in Table 6), and all rules are in Table 8 have been emerged in Table 6.So we think that the method we proposed to mine the mobile customer behavior pattern has the merit of effectiveness; in this experiment the validity of the method is about 91.67% (11/12) to the Apriori algorithm and 100% (6/6) to the GRI algorithm, which means the coverage values are 91.67% and 100%, which are larger than the threshold we set before.It means that the method we proposed is feasibile and effective.Excluding the 11 rules in Table 7, Table 6 has other 28 rules and these rules have the feature of high interest, so they will provide more choices to the service provider and more services to the mobile customer.Then we observe the rule which has the maximum value of lift from Tables 6 and 7, the rule is time =  17 and service =  3 → location =  3 ; it means that the method we proposed is similar to the classical Apriori algorithm.At last, the rule whose ID = 1 in Table 7: location =  1 → user =  1 ,It is the only rule that is not included in Table 6, although this rule meets the minimum support and the minimum confidence; the pattern with { 1 ,  1 } is a low interesting pattern as we definite before, and the rule location =  1 → user =  1 is an uninteresting rule.In our method, we can reject uninteresting rules like this.Through the analysis, the method we proposed in this paper is more feasible and advanced when being compared with the Apriori algorithm.

Conclusion
In this paper we considered the context factors which influence the tourist behavior pattern comprehensively, such as the device the tourist use, time, location, and service types, and got the context set which influences the tourist behavior pattern.Then we proposed a method to mine tourist behavior patterns based on the network diagram; this method constructed a network diagram firstly.Then, we got the behavior patterns with high interesting degree and did association rule mining in the patterns and got the rules; at last, we made an experiment to show the feasibility and effectiveness of our method.In our experiment, we set the low interest degree th 1 value to be equal to 0.8 and the high interest degree th 2 value to be equal to 1 and deleted the low interest pattern; then we did association mining with Apriori algorithm to the remainder of the patterns and got 39 rules; we can do some recommendations to the tourist with these high interest rules.Compared to the results which do not use this method, it has the following advantages: (1) it can keep the interest rules and delete the uninterested rules in the results; (2) it can produce many other interest rules, which

Figure 2 :
Figure 2: Movement trajectories of customers when they use different devices.

Figure 3 :
Figure 3: The results of the questionnaire.

Figure 4 :
Figure 4: The network model of the tourist behavior pattern.

Table 2 :
Timestamp table.Definition 3 (location).Location denotes a set of places in which the tourist moves some times, and we use the set  = { 1 ,  2 ,  3 , . . .,   } to represent it.
Definition 4 (service).Service is a set of messages in which the tourist requests tourism services from the suppliers, and we use the set  = { 1 ,  2 ,  3 , ...,   } to represent it.Definition 5 (timestamp, sojourn time and service request time).To represent the time quantum of the forming of the tourist behavior pattern approximately, this paper divides a day's 24 hours into 24 time intervals simply, as shown in Table2; every time interval denotes one hour, and the hour denotes one timestamp; sojourn time   denotes the time in which the user sojourns at somewhere; service request time   denotes the time in which the tourist requests some tourism services.
Definition 8 (repeated edge).For a tourist, he may have the same connection edge in two different behavior patterns; such edge will be called repeated edges in this paper.For example, in the following behavior patterns  1 = { 1 ,  1 ,  3 ,  17 ,  3 } and  2 = { 1 ,  2 ,  3 ,  17 ,  3 }, they have two repeated edges, namely, 3  17 and  17  3 .⋈,  ⋈ , and  ⋈ ; the computational formulas of every edge weight are as follows.Connect Edge Value of  ⋈ .The connect edge value of mobile user   and device   equals the ratio of the connect times between user   and device   to the sum times between user   and device set; the specific formula is Definition 9 (connect edge value).Connect edge value is a standard value obtained with standardizing the connect coefficient (Definition 6) in the case where the different quantity levels of input variables affect the final mining result.In this paper we use "⋈" to present the connect relationship between different attributes, and specific weights are  ⋈ ,
the network.If the value of interesting degree id is larger than another threshold value th 2 , we will regard the degree of interests of this mobile user pattern as high interest level.Like this, we divide mobile user behavior patterns into three parts, namely, low level of interest, common level of interest, and high level of interest.
through collecting user data, mainly including tourist information (  ), mobile device ( ℎ ), location (  ), collecting times (  ), time (  ), time of the user stay the location (  ), the service type the user request (  ), and time of the user request the service (  ).
ℎ ,   ,   , and   , respectively, and the number of the layer nodes corresponds to each attribute value number, as shown in Figure6.ThirdStep.Collecting the adjacent nodes, each connection coefficient should be marked as Definition 7; we need to add the connection coefficient of the side when it repeats several

Table 4 :
Tourist behavior information table.
Calculate the tourist  1 collection weight.