In modern days, to enrich e-business, the websites are personalized for each user by understanding their interests and behavior. The main challenges of online usage data are information overload and their dynamic nature. In this paper, to address these issues, a WebBluegillRecom-annealing dynamic recommender system that uses web usage mining techniques in tandem with software agents developed for providing dynamic recommendations to users that can be used for customizing a website is proposed. The proposed WebBluegillRecom-annealing dynamic recommender uses swarm intelligence from the foraging behavior of a bluegill fish. It overcomes the information overload by handling dynamic behaviors of users. Our dynamic recommender system was compared against traditional collaborative filtering systems. The results show that the proposed system has higher precision, coverage,
The customer relationship management (CRM) entails the interaction of an organization with the current and future customers. The competitions in e-business require the efficient management of web usage data because a competitor’s website may be only one click away. An improved understanding of customers’ interest and their behaviors increases the profit of an organization. A personalized website in view of the customer’s interests may bring customer’s attention to the site more and thus increases the customer utility. The information regarding customer’s interest and behavior also helps a website administrator to personalize or customize a web page for a user. Such increased usage of business websites online creates a huge amount of web usage information to manage causing information overload. To manage this information overload, efficient data mining techniques can be applied in addition to storing, retrieving, and managing these web usage data. These data mining techniques also may be used to identify the interesting patterns from web log data or online usage data.
The major challenges of online web usage data, in addition to information overloading, are its high dimensionality and dynamic nature caused by thousands of users. The online usage data is high dimensional because it contains huge number of clicks made by the users to purchase items. The online usage data represents the interest of human beings that are highly dynamic in nature. These dynamic behaviors may be due to the changes in the user’s interest or due to the addition or deletion of web pages in a website. The personalization of the web for a user should also cope with these issues.
Designing and developing a suitable recommender system may be very much helpful in web personalization. It uses the recommendations provided by the recommender systems for providing the users with their items of interest. In the past many research works have been done in recommender systems. But most of the traditional recommender systems cannot handle the dynamic nature of online usage data.
Moreover the traditional recommender systems give limited recommendations. In traditional recommender systems, the number of iterations before convergence is high and also the quality of recommendations reduces with the increase in the number of users. The traditional recommender system also cannot balance the quality measures such as coverage and precision.
To overcome the above issues, we propose a WebBluegillRecom-annealing dynamic recommender system which could also provide recommendations to users. The proposed dynamic recommender system uses swarm intelligence approach. That is, in our dynamic recommender system, the recommendations are given not only based on users’ interest but also based on the interest of the neighborhood users. Our dynamic recommender system also overcomes the overspecialization problem in many traditional recommender systems by providing variety in recommendations.
The performance of the proposed algorithm is compared with the traditional collaborative filtering recommender systems. The results of performance evaluation show that the proposed dynamic recommender system gives better predictions in less time without losing the quality in terms of coverage, precision,
The rest of the paper is organized as follows. Section
Ben Schafer et al. [
Web usage mining is the process of applying data mining techniques to web log data to discover interesting usage patterns [ Preprocessing the web log files. Pattern discovery using data mining techniques [ Postprocessing. Tracking evolving user profiles [
Each entry in the web log files consists of IP address, URL viewed, and access time. The web log files extracted from the web server contain a huge amount of information. All these pieces of information are not needed for further processing. The quality of the patterns discovered after web usage mining process depends on how well you perform data cleaning and user session identification. Data cleaning includes filtering the crawler’s request, request to graphics, and identifying unique sessions. The user session identification includes identifying the pages referenced by a user during a single visit to a site.
Once the user sessions are identified, various data mining methods such as frequent item sets, clustering, classification, association rule mining, path analysis, neural network approaches, and heuristic approach methods can be applied to extract useful patterns from web log files. These discovered patterns identify users’ interests, behavior, habits, and changes in their interest. A website can be personalized or customized for a user based these pieces of information, thereby increasing the profit of an organization.
User session categories [
Swarm intelligence gains inspiration from several communities in nature such as fish schools, ant colonies, honey bees, and bird flocks. Swarm intelligence uses intelligent agents to handle copious information. An agent perceives the environment through sensors and it acts on the environment through actuators [
Stimulated annealing [
In the past, many research works were done in swarm intelligence for web usage mining like AntClust [
In the present work, we propose a WebBluegillRecom-annealing dynamic recommender system. It uses the simulated annealing and swarm intelligence for identifying the interesting items to be recommended for the users. The WebBluegillRecom-annealing algorithm gains inspiration from the foraging behavior of bluegill fish. Swarm intelligence uses intelligent agents to handle abundant information on the web, thereby increasing scalability. Here, intelligent software agents are used to model artificial life. Intelligent software agents can handle the dynamic nature of online usage, thereby overcoming information overload problem. This flexibility property permits the artificial bluegill fish to model foraging behavior of real bluegill fish in different densities of prey in water. The learning capability of software agents allows continuous monitoring of users dynamic behaviors and gives predictions.
The proposed WebBluegillRecom-annealing algorithm uses a cooling schema to make all agents in stable state. The cooling algorithm has been developed based on the simulated annealing approach. The cooling schema in the proposed WebBluegillRecom-annealing algorithm reduces the number of iterations required for the agents to enter into a stable low energy state. Figure
Steps involved in WebBluegillRecom-annealing dynamic recommender system.
In the proposed WebBluegillRecom-annealing dynamic recommender system initially each user obtained after the data cleaning process is mapped to an agent. The agents are placed randomly on the 2D visualization panel. A cooling algorithm is then applied to bring similar agents nearer to each other in the visualization panel. This gives an initial neighborhood for agents. A better neighborhood is formed in each iteration of the algorithm by iteratively adjusting the position of the agents on the visualization panel. That is, the users that exhibit similar behavior will form a hinterland. In order to handle dynamic data that is collected incessantly and to improve the quality of neighborhood, a dynamic clustering technique is applied. Recommendations are given to users as best
The proposed WebBluegillRecom-annealing recommender system can handle the following challenges of web usage mining such as information overload problem, dynamic behavior of users, large number of iterations before convergence, and scalability and overspecialization in recommendations problem.
Web server log files are preprocessed and input user sessions are identified. Here the
In this paper, a dynamic clustering based data mining technique is used to discover interesting online usage patterns. Unlike conventional clustering, in dynamic clustering [
In the proposed WebBluegillRecom-annealing algorithm (Algorithm
Notations used: Input: dataset Output: Steps: ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( (
Notations used:
on the visualization plane;
Input: Output: Steps: ( ( ( ( ( ( Where ( ( ( ( ( ( ( (
( (
Notations used: Pos(Agent Input: Agent Output: Clusters Steps: ( ( ( ( ( ( ( ( ( (
Input: Extracted Clusters Output: Extracted Clusters Steps: ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( (
In the proposed WebBluegillRecom-annealing algorithm, initially each user is mapped to an agent. Agents are placed on the visualization panel randomly. Visualization panel is a two-dimensional plane represented by
To bring similar agents closer and dissimilar agents far apart in the visualization panel, we use a cooling algorithm based on annealing concept used in metals. The attributes of this thermodynamic simulation can be mapped into stimulated annealing optimization, where a system state represents feasible solutions, energy represents costs, change of state represents the neighboring function, temperature represents control parameter, and frozen state represents final solution. Initially when the temperature is high, it accepts bad moves. This is because starting solution may not be too good because of the difficulty of escaping from neighborhood. But when the temperature is low, it almost rejects bad moves. The best ever result is kept as the final solution.
The inputs to the cooling algorithm (Algorithm
The cooling algorithm applied is a greedy heuristic allowing the agents to move from current positions to the best neighboring solution. The cooling algorithm returns the agents with their new position on the visualization panel. Now similar agents lie close in the visualization panel. As the distances between the agents on the visualization panel increase their similarity decreases. To avoid local minima, it supports uphill moves. After applying the cooling function, agents converge to a frozen low energy state where similar agents are located nearer to each other. The usage of cooling algorithm reduces the number of iterations. These agents with their updated positions on the visualization panel after applying cooling algorithm are given as input to the cluster-creation algorithm.
In cluster-creation algorithm (Algorithm
In the Bluegill-BestPredictions algorithm (Algorithm
The Bluegill-BestPredictions algorithm, described as Algorithm
In line 2 of Algorithm
For a given URL Ui,
If the similarity of agent
Input: Cluster Output: Cluster Steps: ( ( ( (
If the similarity of agent
If the similarity of agent
These movements of agents result in a new group of agents or merging or splitting of some clusters or even deletion of some clusters. The agents with their new positions are visualized in visualization panel (line 71). In line 73, all the URLs that are visited more than a predefined threshold value Url_count_threshold are displayed for each valid cluster. Line 77 assigns the set of neighboring agents of agentj that lies within a distance
Here, the performance of the proposed WebBluegillRecom-annealing algorithm is compared with the traditional collaborative filtering based recommender system. The performances are evaluated in terms of coverage, precision, and
In Section
The proposed WebBluegillRecom-annealing dynamic recommender system is implemented on high dimensional real life data example. It is implemented using Java agent development environment (JADE). Figure
Initial position of agents.
In Figure
New position of agents after applying cooling algorithm.
In Figure
Clusters of agents obtained after applying cluster-creation algorithm.
Here the neighboring agents that lie within a distance threshold
Figure
Clusters of agents obtained after applying Bluegill-BestPredictions algorithm.
Quality of the obtained clusters can be measured in terms of intracluster similarity and intercluster similarity. For better clusters intracluster similarity value should always be higher than intercluster similarity value. Intracluster similarity value represents the similarity between the elements of that cluster. Intercluster similarity represents the similarity between the elements of a cluster with the members of the other cluster. Table
Average results of 10 runs of WebBluegillRecom-annealing algorithm with ICTF = 0.1 and
Cluster-creation algorithm | Bluegill-BestPredictions algorithm | |||
---|---|---|---|---|
Intracluster similarity | Intercluster similarity | Run | Intracluster similarity | Intercluster similarity |
0.437 (0.014) | 0.014 (0.001) | 10 | 0.468 (0.011) | 0.009 (0.003) |
0.409 (0.019) | 0.019 (0.003) | 5 | 0.399 (0.019) | 0.010 (0.001) |
From Table
Sample user profiles generated by WebBluegillRecom-annealing algorithm with ICTF = 0.1 and
URL frequency | URL |
---|---|
Profile 1 | |
0.4 | roshni/courses/cse309/notes.html |
0.39 | roshni/courses/cse309/slides.html |
0.35 | roshni/courses/cse312/notes.html |
|
|
Profile 3 | |
0.42 | mia/courses/cse312/slides.html |
0.41 | mia/courses/cse312/notes.html |
0.39 | mia/courses/cse312/assignments.html |
0.24 | mia/courses/cse312/ |
0.18 | mia/courses/cse312/quiz.html |
0.12 | mia/courses/cse312/internals.html |
0.11 | mia/courses/cse312/proj |
|
|
Profile 4 | |
0.41 | neha/courses/cse475 |
0.37 | neha/courses/cse312/ |
0.36 | neha/courses/cse309/slides.html |
0.24 | neha/courses/cse475/notes.html |
Table
Sample recommendations given to a user by the WebBluegillRecom-annealing dynamic recommender system with
User → mia | courses/cse309/notes.html |
courses/cse309/slides.html | |
/courses/cse312/slides.html | |
/courses/cse312/notes.html | |
/courses/cse312/assignments.html | |
/courses/cse475 |
Table
Coverage represents summary profile’s items which are complete compared to the data that is summarized; that is, they include all the data items:
Here
In Figure
WebBluegillRecom-annealing system (WR) is compared to standard collaborative filtering (CF) system on precision, 248 users.
From Figure
WebBluegillRecom-annealing system (WR) is compared to standard collaborative filtering (CF) system on coverage, 248 users.
From Figure
Balancing of precision and coverage can be represented using
WebBluegillRecom-annealing system (WR) is compared to standard collaborative filtering (CF) system on
From Figure
Figure
WebBluegillRecom-annealing system (WR) is compared to standard collaborative filtering (CF) system on precision,
From Figure
WebBluegillRecom-annealing system (WR) is compared to standard collaborative filtering (CF) system on coverage,
From Figure
WebBluegillRecom-annealing system (WR) is compared to standard collaborative filtering (CF) system on
From Figure
WebBluegillRecom-annealing system (WR) is compared to standard collaborative filtering (CF) system on variety averaged over 10 different runs per 1 active user, for
Variety means number of distinct recommended items [
To summarize, Figures
Figure
WebBluegillRecom-annealing system (WR) is compared to standard collaborative filtering (CF) system on precision for different volume of users in different months.
From Figure
WebBluegillRecom-annealing dynamic recommender system (WR) is compared to standard collaborative filtering (CF) system on coverage for different volume of users in different months.
In Figure
WebBluegillRecom-annealing system (WR) is compared to standard collaborative filtering (CF) system on coverage for different volume of users in different months.
In Figure
To summarize, Figures
In this paper a new dynamic recommender system called WebBluegillRecom-annealing system is presented. The proposed system is based on the swarm intelligence that gains inspiration from the dynamic foraging behavior of bluegill fish. The artificial life is simulated using software agents. WebBluegillRecom-annealing dynamic recommender system is capable of handling dynamic data. It uses an annealing approach to identify the initial best neighborhood for agents, thereby reducing the number of iterations before convergence. The WebBluegillRecom-annealing recommender system includes variety in recommendations, thereby overcoming the overspecialization problem in some traditional recommendation systems. The results obtained are compared with the traditional collaborative filtering system. The experimental results show that the WebBluegillRecom-annealing recommender system can better handle dynamic behavior and seasonality in users’ interest than the traditional collaborative filtering systems. The experimental results show that the recommendations given by WebBluegillRecom-annealing system have better values for precision, coverage, and
The authors declare that there is no conflict of interests regarding the publication of this paper.