A Process Mining Based Service Composition Approach for Mobile Information Systems

Due to the growing trend in applying big data and cloud computing technologies in information systems, it is becoming an important issue to handle the connection between large scale of data and the associated business processes in the Internet of Everything (IoE) environment. Service composition as a widely used phase in system development has some limits when the complexity of relationship among data increases. Considering the expanding scale and the variety of devices in mobile information systems, a process mining based service composition approach is proposed in this paper in order to improve the adaptiveness and efficiency of compositions. Firstly, a preprocessing is conducted to extract existing service execution information from server-side logs.Then process mining algorithms are applied to discover the overall event sequence with preprocessed data. After that, a scenebased service composition is applied to aggregate scene information and relocate services of the system. Finally, a case study that applied the work inmobile medical application proves that the approach is practical and valuable in improving service composition adaptiveness and efficiency.


Introduction
Along with the rapid advancements in big data and cloud computing technologies, connection of everything is emphasized in many information systems.Thanks to the achievements of devices, infrastructure, and applications in mobile computing [1,2], systems become more powerful and intelligent with the support of connection among devices, people, and business processes.Particularly, according to the recent research [3], mobile technology development has resulted in the creation of up to 1450,000 applications for smart phones in the last few years.More and more information systems rely on service-oriented processes in order to fit the continually changing business environment and to align business strategies with IT systems [4].With strong interaction with people and social environments, these systems have a great impact in many areas such as health care [5,6], exploiting indoor location [7], and other scenarios.As a result, it is becoming more and more valuable to deal with the connection among devices and interaction among people especially in the environment of the Internet of Everything (IoE).
Due to the flexible and scalable characteristics of serviceoriented computing, more and more systems use web services composition to deal with the complexity of multisource data in mobile information systems.Business processes and associated services become the most significant supports for the connection of everything.They make functions and devices work as expected in well-organized systems.Achieving adaptiveness in process-based service composition is the key to improve efficiency and adaptiveness of mobile systems.
However, as both the scale and the variety of devices are expanding, the complexity of service implementation is increasing.To sum up, challenges exist in keeping the system process adaptive to the changing environment as the following points: (1) Process execution environment is changing: in the environment of IoE, as users, devices, and services are widely distributed, the execution of the process may be affected by changing device rules, connection situations, and event users' habits.As more complex rules are introduced with the devices, static processes always lack the consideration of execution environment, and they cannot handle the changing environment efficiently.For instance, in mobile systems, different versions of applications are used at the same 2 Mobile Information Systems time, which will make the processes in the server side suffer from errors if they cannot handle the changing orders of events.
(2) The complexity of relationship in events and services is increasing: since types of devices are increasing, the relationship in events and services is getting more complicated.Current process-based service composition is not flexible enough to support the complex situations.As a result, approaches designed for application execution are usually incomplete and lacking necessary business consideration.For example, in a smart house application, when new devices like new models of air conditioners are introduced, new events and new connections will be introduced and the controlling process should be fixed accordingly in order to keep the devices and services work correctly.
In our previous work [8], the service composition based on process mining approach has been applied to a logistics cloud service platform which supports the users from different companies to customize their functional services.In the example case about the waybill transportation process, a suitable waybill-related composite service is generalized to connect the information sensing devices like radio frequency identification (RFID), infrared sensors, global positioning system (GPS), and laser scanner.And it is proved that service composition based on process mining is suitable for the situation with indefinite requirements and without high performance demand of the result composite service.Considering the expanding scale and the variety of devices in mobile information systems, a process mining based service composition approach is proposed based on our previous work in this paper in order to improve the adaptiveness and efficiency of compositions.
Generally speaking, the main contributions in this paper can be summarized as follows: (i) Firstly, to solve the problems above, process mining based service composition is proposed to produce adaptive service composition according to real execution information.A three-step framework is presented to cover the whole life cycle of service composition based on process mining.
(ii) Secondly, according to the framework, a set of models is put forward to support the holistic service composition approach which covers both the practical business and the execution effectiveness.
(iii) Then, to apply request-based logs in event-based process mining, a preprocessing algorithm is presented to transfer request-based logs to event-trace-based models so that the execution data can be used in process mining.
(iv) Last but not least, a scene-based service composition algorithm is presented in order to transfer the process mining results to service composition models which can be further used in service generation.
The remaining parts of the paper are organized as follows: in Section 2, an overall description of the proposed approach is provided.After that, the formal analysis and algorithms in context-based service matching is described in Section 3.And then a case study is presented to validate the method in this approach in Section 4, followed by a brief discussion and comparison of the related works in Section 5. Finally, conclusions and future works are given in Section 6.

Overview of Process Mining Based Service Composition
In the environment of IoE, large amounts of event-based devices are involved in information systems.Each of them has individual rules due to the differences in types of devices, users, and execution context.In certain situation, they invoke a set of services to provide and retrieve data as well as execute special functions.Behind the devices, the server-side business processes which represent the sequences of service execution and service composition take the role to ensure the functional correctness of the whole system in either explicit or implicit way.
Process mining [11] is a process management technique that extracts information from event logs recorded by an information system to discover, analyze, and enhance process models.Service discovery mining is one of the most potential applications of the state-of-the-art process mining technologies [12].It includes discovering service behavior, checking conformance of service, and extending service model based on event data.The processes discovered by process mining can provide the best practices during the execution period.The discovered processes with frequently used services can be regarded as composite web service patterns to help the developing of service composition.To improve the fitness of event rules applied in widely spread devices and the business process maintenance in information systems, three concerns are involved, namely, the execution log from IoE environment, control flow analysis for server side, and the service composition.
In order to cover the life cycle, three phases in process mining based service composition are proposed in this paper, as shown in Figure 1.
First, the approach preprocesses the execution data from current system by extracting the service logs and transforms them into valid traces.Then we leverage process mining algorithms to mine the control flow with the result of the previous step.After that, a metamodel is designed to connect the information of execution environment existing business rules, and service deploy model is generated after relocation the service mapping.The description of the steps is as follows: (i) The first phase is to preprocess device request services: With execution information retrieved by preprocessing log data, the approach produces a service deploy model for constructing service compositions that is more accurate to the requirement in IoE.Afterwards, new logs will be recorded during the execution of the composite service; therefore the whole life cycle of the service composition procedure becomes a closed loop.

Process Mining Based Service Composition
In the following part, the framework mentioned above will be refined to introduce its specifics.

Models for Process Mining Based Service Composition.
A set of models are defined in order to cover the life cycle of process mining based service composition in the three phases of the approach.Figure 2 shows three sets of models and their relationships involved in our approach, including Service Log Models, Process Mining Models, and Service Composition Models.

Service Log Models.
Service log models are the set of models that cover preprocessing procedure.The included models are Invocation Log Model, Service Event Model, and Trace Model as the following definitions.Definition 1. V () represents the invocation records that devices executed as event requests.It is a list of service invocation records containing information of devices, users, services and the execution timestamp.The definition of ILM is as equation ( 1)-( 4).
Definition 2. V () represents the dictionary of the mapping rules between events and the execution services, as shown in (5).The event is defined as in (6), and the service shares the same definition as that in (2): V ← {V, V} .
Definition 4.  () contains a group of traces that represent a sequence of continual operation events, including a set of event models and the time duration information, as in the following equation: ← { {} , , , } .(8)

Process Mining Models.
The process mining model restores information for process mining.The Extensible Event Stream (XES) can be regarded as unification data form between trace models and standard process mining input.The input format of this phase is XES which is a process instance that has integrated multiple Service Events.It contains multiple processes, which are called trace in XES standards, and every trace is related to a trace model that contains multiple events.Definition 5.  () is the output of process mining.Business process is defined as a process that contains events and the control flow between them which is presented as event and transition.And a set of frequency representing the execution frequency of each event is also included for further analysis, as in the following equation:  (11) trace.add(r)( 12) else (13) TM.add(trace) (14)  ← {} (15) end if (16) end while (17) return  Algorithm 1: Preprocessing-preprocessing logs for process mining by steps of removing invalid records, eliminating similar request in a short time, picking the successful request and deleting others, connecting service with events, and dividing the events into traces, according to time duration.Definition 7. V () represents the mapping between events and most suited services:  ←  {V, V} . (11)

Execution Log
Processing.The log data in IoE is getting more complex with increasing amount of connections, leading to larger scale of events and services.As a result the service logs are not suitable for process mining due to noises and unclear boundaries.Therefore, in the first phase of our method, we extract the execution data from service logs, remove the noise data, and generate traces in trace model.The preprocessing algorithm is shown as Algorithm 1.Consider the record size of initial logs as data size .The data cleaning part (line (1) to line (5)) takes a time complexity of (), for we only have to travel the data once and remove dirty data by (1) determinations.And the sorting part (line (6)) is a classic sorting problem which can be optimized to finish in ( log ).Finally, the connecting part (the while loop) takes the time complexity of ().Because we go through the clean logs (less than ) again and the creating of trace is an (1) operation, the overall complexity of the algorithm is ( log ).As we can see, the preprocessing procedure uses most time in sorting the event records.If the records are already sorted in the initial logs, this algorithm can have a time complexity of ().As to space requirement, the cleaning part can be done in place.The sorting part and connecting part each take () space.Because the data size  can be controlled by separating logs by different time periods, this step can be done distributively in acceptable time.Therefore the preprocessing step will not take too much time regarding large scale of logs.

Preprocessing Noise Data.
In preprocessing phases, first of all, service invocation logs are used as input of preprocessing step.The original logs keep recordings of service invocation information.Logs contain information for process execution and bridge the gap between service composition and service deployment.However, the logs cannot be used as input of process mining directly as a result of different viewpoints of data organization and different structures of data storage.Therefore, before doing process mining, it is necessary to remove the outdated and incorrect data in logs to extract the required information.
First of all, we manually decide valid users, valid time, and max transaction duration, which means to define {} and {}.Then we remove the invalid records according to the valid configuration.After that, we eliminate the duplicate records that are produced due to connection errors in network.

Generating Event Model
. The next step is to transform the records into the event models with the assistance of event dictionary.As mentioned above, the original service invocation logs are restored in the form of .And the process mining are based on event data like .So we transform the ILM into EM by mapping the attribute of .V.and .V.,which is presented as V() in the algorithm.

Generating Trace Model.
The last step of preprocessing is to reorganize the event models into trace models.
Other than the Iterative Expectation-Maximization Procedure method introduced in [13], which takes too much time when confronting large amount of logs, we use the dividing strategy based on time duration separation.First, we group the event models by the attribute of user.That is, for each user, we have a group of (event, timestamps) pairs.By sorting the events on time, the group of events contains sequences of events.Then we separate them into different traces according to the time duration.

Process Mining.
Process mining is a technique that extracts information from event logs recorded by an information system to discover, analyze, and enhance process models.As in Figure 3, the event logs are from the executing network of devices.

Transforming Trace Model to XES.
Processing event logs is to convert the information for process mining we got from log processing into the input criterion required by the process mining tool (like ProM [14] and Disco [15]), which requires XES (Extensible Event Stream) as input format.XES file is a process instance that has integrated multiple service events.It contains multiple processes, which are called trace in XES standards, and every trace contains multiple events, as in the left part of Figure 3.

Executing Process Mining.
In the part of process mining, the fuzzy mining algorithm [16] is selected.In the case of our implementation, we choose the fuzzy miner module of tool Disco.The miner is based on the significance and correlation of events to produce adaptable process models, as in the right part of Figure 3.

Scenario-Based Service Composition.
After the steps mentioned above, the process model is produced from device-to-service invocation log.The next step is to adjust the process by execution frequency of events and relocate the services to the process.We provide the procedure as Algorithm 2.
Consider the total event size as data size .Removing less important nodes (line (1) to (5)) takes (), because we only have to calculate the result of ∑   ∈ (  ) once.And in the event grouping and scene generalization part (line (7) to line (16)), calculating all the sim(  ,   ) takes ( 2 ).And add/remove operation can be done in  (1).Since the while loop iterates at most  times, the worst complexity of the algorithm is ( 3 ).As we can see, the most time taken is in generating Composition Model.The iteration time is dependent on specific data.Comparing to other composition approaches, the scenario generation takes extra time to simplify the processes.Since the event size will not be very large in systems, the time consumed is considered acceptable.

Scene-Based Event Analysis.
As a process mining result, a mined process is presented as a directed graph with nodes and edges.By analyzing the source and target in process model, we could get the sequence of events in a process graph.In the graph, nodes represent events and edges indicate the transitions of events.Each edge has a weight representing the frequency of transitions.
Thus () is the ratio of its frequency () and the sum of all the event frequencies.The events with much low frequency can be removed from the graph.
And for the edges, we note sum of all the input transition frequencies as () and sum of all the output transition frequencies as (): The smallest () is the start node of the process, and the largest is the end node.For a transition , and its source event  = .V, the importance of the transition () is shown as follows: If () is much lower than normal, the transition hardly happens according to existing logs.So it can be removed: For the nodes with similarity close to 1, they are normally executed as a patterned sequence.In other words,   ,   are usually executed at the similar situations.We can group (  ,   ) as a scene.And this procedure is repeated literately.

Determine Key Services.
In this part, services are marked with priorities in order to pick the most suitable service for each event.In the service repository, similar services are existing.However, these services have different influence in a particular process environment.It is necessary to pick out the most suitable services.
After process mining, two factors can be introduced in service selection: relevance of service-to-event and relevance of service-to-scene.For each event, each service has a priority.The same event may not invoke the fixed service every time, and one service may also be provided to multiple events, so we need a method to choose suitable services, that is, the strategy we use to extract Key Service from all the invoked services (in service repository).We calculate the weight of the service for the event to measure its criticality in service mapping.(, ) represents the number of execution time from service .The outdated data is filtered, so (, ) can be used to calculate the importance of service  to event : With the priority, each event can be related to most usually used services, which means  ← {V, V} can be generated.And the combination of Composition Model and KSM Model becomes the Service Deployment Model.

Case Study: An Application in Mobile Medical System.
In this section a case study will be presented to demonstrate the approach.
One of the most potential usages of connecting everything is the application of IoE in medical processes.
For case study, a mobile medical system with large numbers of smart devices (mostly smart phones) in China  is used in this evaluation (as in Figure 4).In particular, a registration process is demonstrated in the following part.
As the mobile medical system is getting popular, it is widely used in many provinces over the whole country.The connection network of people, devices, and medical organizations is getting larger recently.With larger scale of usage, the system faces difficulties in optimization of services.The devices have different operation systems and application versions.Due to the variability of operation systems, application versions, and geological locations, the behavior of usage cannot be unified.Unpredictable service usage leads to difficulty in optimization of services.It is inconvenient for updating both mobile applications and server-side systems.

Preprocessing the Logs.
For the case study, five months of logs from the http server of the system is used.The selected logs are from May 2015 to April 2016.Each record includes V ,  ,  V, ℎ V .The initial log is shown in the left part of Figure 5.
In this log, each record represents a service request.Typical noise of the data includes duplicate operations, invalid operations, and unclear transaction boundary.First, data cleaning is applied to the initial logs.Then, we execute V  .And the structure of event dictionary is shown in the right part of Figure 5.After mapping service request URL with events, each record is transformed into event model as the bottom part in Figure 5.
To identify traces, the following rules are applied: to ensure over 75% traces are correctly identified, operations that take less than 30 min and 36 seconds are regarded as the same trace.And the result of Trace Model is shown in Figure 6.

Process Mining.
In the process mining phase, the first step is to transfer Trace Model into standard process mining input, that is, to generate XES file with the above method.In the case study, the log is transferred into the log.After preprocessing, we transfer the trace models into XES format, as in Figure 7(a).Disco is chosen to be our process mining platform where the XES can be used directly as standard process mining input.After selecting filters (as Figure 7(b)), we choose the fuzzy miner as the process mining strategy.The tool is used to analyze the interaction records among the business activities in the processes and through mining and reasoning to get the process model.After process discovery, the process model (as in Figure 7(c)) is stored in the form of the XML file (as in Figure 7(d)).

Scene-Based Composition.
Then we combine the service set with the event set.The service is combined with the event according to the corresponding event ID.The similar phase is done to the role set as well.Figure 8 shows the optimization of control flow in this case, which includes start node identification, similar event composition, and less significant event reduction.
Through service selection, a set of key services will be generated.After we import the data of process mining phase to service composition phase, the Service Deployment Model can be generated.And with template technologies, we can generate the service descriptions for service compositions  of scenes.Figure 9 shows examples of result of key services mapping and service generation.In Figure 10(a), the key services are mapped to the events ( = 0.9, V = 0.1 for priority calculation).And Figure 10(b) shows one of the examples of generated WSDL descriptions for composited service.
Then the composite service is registered in the service library and enters the service deployment phase.After longterm running, the execution of this service will leave behind service logs which can be used for the new process mining phase of the next generation.

Result and Discussion
. After applying our work to the mobile medical system, the registration process of the system is improved considering two criteria.
First of all, the simplicity of the new process is improved after we composite the services that invoked as a pattern.Secondly, as services are composited for certain scene, the rules defined in devices can be simplified.And with the discovery of composition, further optimization can be implemented to redeploy the services so that services in the same scene can be physically deployed in the same server to reach a better performance.
We recollect the execution logs after adjustment of event rules to the new service compositions.To evaluate the performance, we compare two log data, one from the week right before redeploying the service composition and the other from the week right after applying our method (see Table 1 and corresponding Figure 10).It is assumed that, in the continuous two weeks, the user behavior and the operation of the application should not change much.As we can see in the result, after reduplicate request and meaningless events are removed, the total amount of the events is reduced owing to the simplification of the process.To complete the same functional requirement, the events of each case are greatly reduced.And the relative percentage of event that may be caused by users' hesitation like "Select City" and "Switch Province" is reduced.Thus the execution of the process is improved by efficiency.
As to privacy issues, first of all, the input of our approach is system log that contains service requests.They do not contain sensitive data such as credit accounts.Our method just uses the necessary data that is usually used for system maintaining.And after process mining, the mining result is a summary of all the behaviors rather than an operation

Related Work
The existing approaches that perform service discovery and service composition will be discussed in this section.For service selection solutions, in [9], a service selection technique is proposed to select the best potential candidate service from a set of functionally equivalent ones.The approach in [17] takes several aspects such as QoS, user preference, and the service relationship into consideration.And the work [18] proposes an effective approach to extract events and their internal links from large-scale data with predefined event schema.
As to context-aware dynamic service composition approach and AI planning techniques in addition, [10,19,20] use models at runtime to guide the dynamic evolution of context-aware web service compositions to cope with unexpected situations.Reference [21] proposes a service granularity space for multitenant service composition, which provides a semantic basis for multitenant service composition.In [22], a methodology based on process mining is proposed to do business process analysis in health care environments to  identify regular behavior, process variants, and exceptional medical cases.
For optimizing the existing service approaches, there are few approaches about service composition in the area of service mining, such as service composition analysis and optimization.The following works are devoted to optimizing the existing service composition based on mining patterns from existing data.A mining algorithm based on statistical techniques to discover composite web service patterns from execution logs is proposed by [23] to better understand, control, and eventually redesign the composite services while [24] proposed an approach to generate service composition pattern for cloud migration from a set of service composition solutions by a graph similarity analysis approach.In [25], an event-based monitoring approach for service composition infrastructures is presented to provide a holistic monitoring approach by leveraging Complex Event Processing techniques.In summary, the works [23][24][25] use data mining instead of process mining.
Our work proposed a service composition approach based on process mining, which is aimed at improving the adaptiveness and efficiency of compositions considering the expanding scale and the variety of devices in mobile information systems.In terms of the main objectives of these three approaches, our service composition approach is based on process mining and can select services according to the result of the process mining while the other approaches either focus on performance or on context environment.We compare our work with the recent service composition approaches in service composition research area, that is, QoS-based service composition approach [9] and context-aware dynamic service composition approach [10] in Table 2.Although it is hard to execute the data with existing approaches, our approach is more suitable in some cases.Our approach

Figure 2 :
Figure 2: Process mining based service composition models.

Figure 4 :
Figure 4: Mobile medical system in IoE.

Figure 5 :Figure 6 :
Figure 5: Mapping service request with event dictionary.
(a) Snippet of generated XES file (b) Configuration fuzzy mining (c) Output of process mining (d) Output in XML format

Table 1 :
Comparison of event frequencies before and after our optimization.