Resource Management Scheme Based on Ubiquitous Data Analysis

Resource management of the main memory and process handler is critical to enhancing the system performance of a web server. Owing to the transaction delay time that affects incoming requests from web clients, web server systems utilize several web processes to anticipate future requests. This procedure is able to decrease the web generation time because there are enough processes to handle the incoming requests from web browsers. However, inefficient process management results in low service quality for the web server system. Proper pregenerated process mechanisms are required for dealing with the clients' requests. Unfortunately, it is difficult to predict how many requests a web server system is going to receive. If a web server system builds too many web processes, it wastes a considerable amount of memory space, and thus performance is reduced. We propose an adaptive web process manager scheme based on the analysis of web log mining. In the proposed scheme, the number of web processes is controlled through prediction of incoming requests, and accordingly, the web process management scheme consumes the least possible web transaction resources. In experiments, real web trace data were used to prove the improved performance of the proposed scheme.


Introduction
Ubiquitous personal devices such as notebooks, smartphones, and web-enabled televisions enable their users access to the internet at any time. Further, people can access their preferred online media, such as social networking sites, on the go. This convenience in access to web services has led to an increase in web traffic. Frequent accesses to a web service result in a heavy burden on the web server, which in turn may result in delays in processing users' requests. Long response times for incoming requests decrease the web server's quality of service. Web users may even cancel their requests. To protect service quality, service providers should provide the requested web documents to web users as soon as possible. To address this issue, web service providers have built web cluster servers for improving web throughput; however, meeting users' demands remains a challenging task.
In general, a web document includes several data from the server, including; html, images, audio, and video. To retrieve a single web document for a user, the web browser obtains one main web object and several related, embedded web objects of the requested web document. The web browser detects the main web object to retrieve the list of embedded web objects for the requested web document. Based on the list of embedded web objects, the web browser sends requests to retrieve them. When the retrieval of one embedded web object is delayed, the overall transaction time for the web document is increased. Therefore, in order to view a web document, web user needs to wait till all the embedded web objects are retrieved. A web server system must manage limited resources to process incoming requests in a relatively brief time. It is critical to manage the number of idle web processes for future requests. If the web server system only creates processes reactively, that is, after receiving requests from a user, it takes more time. To reduce the time required to generate processes, the web server system proactively creates idle web processes. When a web browser establishes a new connection with 2 The Scientific World Journal

Related Works
General web server systems like Apache assign one process to handle incoming requests. The process reads documents or obtains the information from a database depending on the requests from web browsers. Some web management schemes improve the performance of web server systems by using extra system resources. Therefore, it is critical to assign resources efficiently to improve the performance of a web server system. CPU time and memory are essential resources to consider when deciding the transaction time. If numerous idle processes are running concurrently to process future requests, the server can save the time it would otherwise take to generate process to handle incoming requests. However, such a scheme also consumes resources. To improve the quality of the web service, most web browsers provide persistent connection and pipeline schemes. Persistent connection and pipeline schemes consume web server system resources to reduce the time spent on inefficient transactions such as generating processes or obtaining web objects sequentially.
Through a persistent connection scheme, the web browser obtains several web objects from the same connection. In Figure 2, the web browser acquires two web objects, including object "A" and object "B. " In the previous scheme, the browser must establish a new connection with the server to retrieve each object. After retrieving web object "A" the browser terminates the connection with the server. Subsequently, the browser establishes a new connection with the server to retrieve web object "B. " Processes and connections to handle previous requests are reused for future requests from the same web browser. It can save transaction time to reestablish the connection and process, but it wastes memory to maintain processes dedicated to specific web browsers.
Web browsers can also save transaction time using a web pipeline scheme. A web pipeline scheme enables a web browser to retrieve related web objects simultaneously. Figure 3 shows the difference between the previous scheme and a web pipeline scheme. In the previous scheme, the web browser sends requests to the next web object after retrieving the current requested web object. However, a web browser using the web pipeline scheme sends its request to the next web object before retrieving current requested web object. If the web browser establishes only one connection with a web server, this scheme cannot significantly decrease the transaction time. However, current web browsers utilize multiple connections with servers. A web browser sends several requests concurrently through multiple connections. When a web browser establishes multiple connections, the web pipeline scheme improves the performance of the web server system. Persistent connection and web pipeline schemes enhance the performance of web servers while also requiring limited system resources.

Dynamic Access Patterns.
Within the structure of a web object, a web browser accesses the related embedded objects after accessing the main object. However, the incoming access pattern for web objects is dynamic. When a web browser sends requests for embedded objects, some of the embedded objects are provided by a proxy server, rather than the web server. Proxy servers hold web objects after retrieving them from a web server. When these objects are requested again, the proxy server provides them to web browsers. Requests for the cached web objects are therefore not transmitted to the web server, thus releasing overhead. However, such a cache mechanism makes it difficult for the web server to Retrieval of web object "A" Request to "B" Retrieval of web object "B" Request to "A, " Retrieval of web object "A, " Request to "B, " Retrieval of web object "B, "  determine how many requests are incoming from browsers. In Table 1, we see which embedded objects are requested after requesting one main object, "/atomicbk/main.html" at ClarkNet web traces from [1,2]. When the main object is requested, "orders.gif " is requested most often. However, "stats.gif, " "userlink.gif, " and "comic75.gif " are not requested frequently after retrieving main object.

Previous Works.
There have been many previous attempts to predict incoming user requests. We classify examples here into two categories: chaining schemes based on Markov models and grouping schemes based on clusters of web objects. Chaining schemes are based on th Markov models. A high order of Markov model can provide more accurate predictions; however, increasing the order also increases the complexity of prediction scheme. As a result, chaining schemes restrict the order of the Markov model. Some schemes, such as [3][4][5][6] use top-related objects to predict the next requests. Other schemes, such as [4,[7][8][9][10] use long access sequences. The work in [11] designs dynamic P.P.M. models.
Grouping schemes generates clusters of web objects and then predicts a group of web objects for the next incoming requests. The work in [12][13][14] provides a caching policy for a Content Distribution Network platform. The work in [15,16] designs a caching policy for mobile environments. The work in [17] provides a prediction scheme using folder structure. [18] suggests a divide-and-merge scheme via a hybrid of top-down and bottom-up schemes. The work in [19] designs a proxy model for prefetching embedded objects.
The work in [20,21] uses a vector model and semantic power for a web cluster system.
Hybrid schemes design a prediction scheme based on both Markov models and grouping schemes. The work in [22,23] suggests prediction schemes based on several concurrent models, including Markov models, association rules, and grouping schemes. The work in [24] uses an abstraction scheme for defining access patterns and defines user access paths through a Markov model. The work in [25] generates a group of access patterns to web objects using a K-means cluster scheme.
Although many examples of research provide prediction schemes for incoming requests, they do not provide process management schemes for modern web frameworks.

Prediction Scheme.
The work in [1] designs a web transaction prediction scheme called the Double PPM Scheme (DPS). When a web browser requests the main object, the browser also sends requests for related embedded objects to the server. Therefore, we can create a prediction scheme based on a grouping of one main web object and its related embedded objects. In Figure 4, DPS predicts the relationship between web objects in several steps. In the first step, DPS distinguishes between main and embedded objects, classifying objects based on their name. For example, when the name of an object includes "html, " "php, " or "jsp, " it is recognized as a main object; usually such objects are web documents. However, when the name of an object contains "jpg, " "mpg, " or "ogg, " they are classified as an embedded object. In the second step, DPS creates relationships between objects. Gray circles in Figure 4 indicate main objects, while white circles indicate related embedded objects. Arrows between circles show how frequently two web objects are accessed together in a single session. These symbols show the relationships within a single web document.
Continuing in Figure 4, we see that this instance of DPS creates three groups: "A, " "B, " and "C. " There are two different access patterns in this log file. After accessing document "A, " some users access document "B" followed by document "C, " whereas other users access document "C" followed by document "B. "

Web Process Management.
ConWebPro determines the number of web processes to run based on the access patterns of web objects. DPS finds different access patterns based on the structure of these objects. If one document contains many embedded objects, a web browser should access the server after accessing the main object. Similarly, when a requested web document includes dynamic or frequently changed objects, browsers create more HTTP connections to obtain these related embedded objects.
For analysis of web log mining, ConWebPro determines how many requests to embedded objects are incoming after retrieving the main object. In Figure 5, ConWebPro forms The Scientific World Journal 7 three groups depending on the number of related embedded objects. For document "A, " even though "A" contains five embedded objects, three embedded objects are requested after retrieving the main object. Document "B" has two embedded objects and document "C" contains three embedded objects, respectively. However, when web users request document "B" or document "C, " the browser issues one or two requests for embedded objects.
If the main object is requested, ConWebPro anticipates how many requests for embedded object are forthcoming. If document "A" is requested, three requests will come to the server. Therefore, the server system creates three more processes to handle these incoming requests.
When processing requests from browsers, the server system assigns processes to handle incoming requests. It also takes time to process these requests where trans , init , and proc are whole transaction time, initial delay in preparing process, and response time, respectively.
init can be different depending on the number of idle processes. When an idle process is already generated, we can save transaction time. Accordingly, we obtain (2) from (1): where , init hit , , and init fail are hit rate, initial delay with idle process, fail rate, and initial delay without idle process. Accordingly, we obtain (3), because there is no initial delay at init hit : , init fail , and proc are determined depending on the number of idle processes. If idle processes are increased, is decreased but init fail and proc are increased. Too many running processes will waste the limited resources of the web server system. As a consequence, the overall performance of web server system will be degraded: where NP idle is the number of idle processes awaiting future requests. Based on the equation above, a server system should run a small number of idle processes that do not increase for decreasing transaction time. To determine the proper number of idle processes, we should predict future web traffic. Unfortunately, such a prediction is not easy. Overall requests are classified into two groups including requests to main web objects and requests to embedded web objects: where total , main , and embedded are overall requests, requests to main objects, and requests to embedded objects, respectively. It is not easy to predict total and main at web server, because web user can access web document at any time. When user accesses web document, web browser starts to send request to main object of these document.

Simulated System Configuration.
We demonstrated the performance of ConWebPro by applying it to real web traces collected over the course of two days. Based on DPS in [1], we obtained relationships between web objects using the first day of web traces. ConWebPro then classified web documents into three groups depending on the number of embedded objects. Table 2 shows real traces from web sites including the Department of Computer Science in Gangneung-Wonju National University, NASA, and ClarkNet from [1]. ConWebPro obtained relationship between web objects based on Day 1. Based on the results, ConWebPro created web processes on Day 2. Web browsers retrieved objects through persistent connections and pipelines on HTTP 1.1. Therefore, the browsers retrieved multiple embedded objects simultaneously.

Evaluation Results.
Evaluating the performance of Con-WebPro, two schemes were compared on an Apache web server. An Apache server typically maintains a static number of idle processes to handle incoming requests from web users. This inefficient process management scheme wastes the available memory of the server, causing its performance to drop.

4.2.1.
Step 1: Analysis on Web Objects. ConWebPro obtains the structure of web objects based on the web log file from Day 1. As its first step, ConWebPro forms groups of web requests from the same users. This web log file contains all of the requests from every web user. If multiple users access the web server simultaneously, it is difficult to detect the relationship between objects. ConWebPro extracts requests from the same user based on the client IP address found in the web log.
As its second step, ConWebPro classifies web objects into main objects and embedded objects through path and access time. In general, web users request "html" or "script" documents from the web server. ConWebPro classifies "html" and "script" documents as main web objects. In addition, embedded web objects are requested simultaneously after requesting main web objects. ConWebPro makes a group that contains requested main object and related embedded objects simultaneously.
As its final step, ConWebPro determines a relationship based on frequency of requests for web objects. After each 8 The Scientific World Journal  request for a main web object, ConWebPro checks which embedded web objects are accessed frequently. Table 3 shows 10 main web objects and related embedded web objects at Gangneung-Wonju National University. Each main web object contains approximately 10 embedded web objects. Tables 4 and 5 show 10 main web objects and related embedded web objects from the ClarkNet and NASA web traces.

4.2.2.
Step 2: Number of Web Processes. The web log file does not indicate when a web client disconnects from the web server. We are assuming that the web client ends its connection with the web server after retrieving web documents including one main object and multiple embedded objects. The web server can also reorganize the number of processes after releasing connections. As with general web servers, the static scheme attempts to maintain a predefined number of processes. Static-2 creates two processes, while Static-10 creates 10 processes for incoming requests. Our ConWebPro creates two processes for incoming main web objects. Figure 6 shows the number of web processes for the web server. The -axis shows the time of simulation, while the -axis shows the number of times in which processes have been created for incoming requests. Following (4), a high hit rate for a process can decrease the transaction time for  processing requests. We compare our ConWebPro scheme and two static schemes including Static-2 and Static-10. Web traces from Gangneung-Wonju National University were not heavy. The average hit requests for our ConWebPro scheme and the Static-10 scheme are higher than for the Static-2 scheme; thus, the transaction time of processing request on both the ConWebPro and Static-10 schemes are decreased. Figure 7 shows the number of idle processes for the web server. The -axis shows the time of the simulation, while the -axis shows the number of idle processes reserved  for future requests. Following (4), the web server should maintain a small number of idle web processes to preserve the performance of web server. Even though Static-10 increases the hit rate for incoming requests, it wastes web server resources on maintaining too many idle processes. Static-2 shows a small number of idle processes. ConWebPro creates a small number of idle processes even though there is a high hit rate for web processes in Figure 6. ConWebPro maintains two processes exclusively for incoming requests for main objects, so ConWebPro has more idle processes than Static-2, which maintains two processes for incoming requests for all objects including main objects and embedded objects. Figure 8 shows the number of times processes have been created. At the initial time of simulation, incoming traffic was not heavy. Therefore, all three schemes show a similar number of hits for incoming requests. Moreover, the small number of idle processes could initially handle incoming traffic. When web traffic increased in the middle of simulation, Static-10 and ConWebPro still show a high hit rate on web processes for incoming requests. However, Static-2 increased missed processes. Figure 9 shows how many idle processes were running on the NASA web server. At the initial time, Static-10 created too many idle process and wasted resources. This increased overhead on the web server as shown in (4). ConWebPro created a small number of idle processes; therefore, ConWebPro conserved web server resources.   Figure 10 shows the number of times processes had been created before incoming requests. ClarkNet contains heavy web traffic. Static-2 shows many missed incoming requests, and Static-10 and ConWebPro show a high hit rate for incoming requests. Figure 11 shows the idle processes running on the web server. Static-10 ran several idle processes to handle incoming requests, but it was more efficient in this case compared to light web traces like the Gangneung-Wonju National University's traffic. Therefore, we conclude that heavy web traffic is needed to justify maintaining a high number of idle processes to reduce the transaction time. ConWebPro can maintain the proper number of incoming requests at light and even heavy web traffic.

Conclusion
In this paper, we applied ConWebPro for efficient resource management on a modern web server system. To improve the quality of web service, popular web schemes including web pipeline and persistent connection consume web server system resources. Therefore, efficient resource management is critical. To be more efficient, a web server system should correctly anticipate future requests. Our ConWebPro predicts how many resources should be assigned to process future requests using predication based on the structure of web documents. When a web browser sends its request to a main web object containing embedded objects, ConWebPro adjusts idle process to handle requests for embedded objects. In future work, we will investigate how resource management affects overall performance of web server systems.