^{1}

^{2}

^{3}

^{1}

^{2}

^{3}

The dynamic nature of information resources as well as the continuous changes in the information demands of the users has made it very difficult to provide effective methods for data mining and document ranking. This paper proposes an efficient particle swarm chaos optimization mining algorithm based on chaos optimization and particle swarm optimization by using feedback model of user to provide a listing of best-matching webpages for user. The proposed algorithm starts with an initial population of many particles moving around in a

With the rapid development in Internet technology, the number of webpages and the volume of information content have led to an explosion in the amount of available information. While there may be some webpages that are more relevant, popular, or authoritative than others, web users look forward to easily, search the most interesting and significant website by specifying relevant keywords [

It is well known that web search is one of the most universal and influential applications on the Internet. Web search engines can support users on a wide variety of topics across a comprehensive range of websites [

Web mining can be viewed as the extraction of structure from an unlabeled, semistructured data set containing the characteristics of users and information [

A number of novel optimization methods have been proposed to optimize the web usage evaluation function. The conventional web mining approach makes a type of relevance ranking, whereby a webpage may be relevant to a topic or theme. Taking into account the amount of available information, the processing essentially requires adequate approaches suitable for extracting only the relevant, sometimes hidden, knowledge as the final result of the problem under consideration. A heuristic intelligent mining approach can be derived using particle swarm optimization (PSO) for addressing the web mining.

This paper focuses on feedback model of user by using chaos optimization and particle swarm optimization to help user find useful information as fast as possible. The dynamic feature of web information as well as the continuous changes in the information demands of the users has made it very difficult to provide efficient and effective approaches for data mining techniques. To realize the goal of searching useful information effectively and efficiently, we have developed an efficient particle swarm chaos optimization mining algorithm (PSCOMA) based on chaos optimization and particle swarm optimization by using feedback model of user to provide a list of best-matching webpages for user. We compare our approach with PSO, PCS, and HITS algorithms, and, as far as we know, it is currently the best method for the problem considered. Experimental results show that our approach significantly outperforms other algorithms in most cases.

The remainder of the paper is organized as follows. A brief survey is given in Section

In this section, we focus our discussion on the prior research on web mining. The web mining is a very important problem and has attracted much attention of many researchers.

Drs. Yin and Guo proposed a new formulation for the website structure optimization (WSO) problem based on a comprehensive survey of existing works and practice considerations [

The rank value indicates the importance of a particular page. A hyperlink to a page counts as a vote of support. Link analysis is a method for determining which pages are good for particular topics based on both the quantity and quality of links pointing to that document. PageRank is a link analysis algorithm, named after Larry Page and used by the Google Internet search engine, that assigns a numerical weighting to each element of a hyperlinked set of documents [

In this section, we design the optimal feedback model of users for the evaluation of the weight of webpage. The explicit feedback is the one in which the user is asked to fill up a feedback form after he has obtained searching results [

Assume that

If

When the feedback model of the user is complete, we propose to define the weight

Assume that

Particle swarm optimization is an evolutionary computation technique based on swarm intelligence optimization algorithm inspired by the social behavior of birds flocking for food, which was first introduced to optimize various continuous functions by Kennedy and Eberhart. It is computationally effective, has fast convergence, and is easier to implement when compared with other mathematical and evolutionary algorithms while only a few parameters need to be adjusted.

Chaos is a universal phenomenon in many nonlinear systems. Chaos optimization can escape from local minima more easily compared with other stochastic optimization algorithms that escape from local minimum by accepting some wrong solutions according to a certain probability [

This paper proposes an efficient particle swarm chaos optimization mining algorithm that attempts to balance exploration and exploitation. The PSCOMA makes full use of the strong global search ability of PSO and the strong local search ability of chaos optimization to obtain high quality solution. PSCOMA uses the properties of ergodicity, stochastic property, and regularity of chaos to lead particles exploration.

The webpages that have higher weight are selected to compose an initial population that is then analyzed by PSCOMA. The basic idea of PSCOMA is as follows. The proposed algorithm starts with an initial population of

Each particle is assigned a fitness value indicating the merit of the particle. Since all the particles represent candidate feasible solutions, we use (

Assume that

During the search process, the particle successively adjusts its position and updates its velocity toward the global optimum using two “best” values:

During the search process for optimum values, it is possible for a particle to escape its search space in any of the dimensions. In each iteration process, when the fitness value of each particle tends to converge or local optimum, it will lead to inertia weight increase; while the fitness value of each particle scattered, it will be easy to make the inertia weight decreases. Therefore, in order to maintain the value of inertia weight range in a reasonable range, (

PSCOMA uses chaotic local search methods during the search process; namely, the chaotic map is used to control the value of parameters in the velocity updating equation. The specific implementations are as follows.

Each of the individual

Use the logistic equation for chaotic iteration so as to get chaotic gene series:

Convert the chaos variables

Evaluate the new solution on the basis of decision variables

The process of PSCOMA consists of the following seven steps.

Data preparation: training, validation, and test sets are represented, respectively.

Particle initialization and PSCOMA parameters setting: the webpages that have higher weight are selected to compose an initial population, which

Randomly generate the position and velocity of particles.

Evaluate the fitness of each particle and store the current position of each particle and the adaptation degree of each particle in

Update the velocity and position of each particle using (

Perform the following chaotic local search for the best particles in population and update its

Stop condition checking: if

In this section, we present the experimental results which include the algorithm parameter configuration and comparative performances with other algorithms. The platform for conducting the experiments are a PC with Intel Core 2 Duo CPU E6300 processor, 1.86 GHz. All programs are coded in C# language under a Windows NT platform. The numerical results are the means of outcomes from 50 independent runs of the algorithms.

The experimental results compare the PSCOMA with several typical web mining algorithms including the PSO, PCS, and HITS algorithms. We experimented with a few queries on six popular search engines, namely, AltaVista, Netscape, Excite, Google, Direct Hit, and Yahoo. We denote the number of webpages and keywords in

Figures

The response time in different number of keywords (

The response time in different number of keywords (

The response time in different number of webpages (

The response time in different number of webpages (

The execution time in different number of generations (

The execution time in different number of generations (

The precision in different number of generations (

The precision in different number of generations (

The recall in different number of generations (

The recall in different number of generations (

The contrast curve of recall and precision (

As we can see from Figures

From the experimental results in the aspects of response time, execution time, precision, and recall, we can conclude that PSCOMA is more satisfactory than the PSO, PCS, and HITS algorithms.

To prevent the user from being overwhelmed by a large number of redundant and useless or uninteresting information, approaches are needed to provide for data mining. In this paper, we have presented a survey on web mining involving chaos optimization and particle swarm optimization. This paper is the first full use of the strong global search ability of PSO and the strong local search ability of chaos optimization for solving web search and has gained a higher quality solution in the aspects of response time, execution time, precision, and recall. In the future, we will extend the PSCOMA algorithm to other domains of data mining and investigate the possibility of reaching closer optimum by improving chaotic local search.

This research work was supported by the Hubei Key Laboratory of Intelligent Wireless Communications (Grant no. IWC2012007) and the Special Fund for Basic Scientific Research of Central Colleges, South-Central University for Nationalities (Grant no. CZY11005). The authors gratefully acknowledge the helpful comments and suggestions of the reviewers, which have improved the presentation.