FastFlow : Efficient Scalable Model-Driven Framework for Processing Massive Mobile Stream Data

Massive stream data mining and computing require dealing with an infinite sequence of data items with low latency. As far as we know, current Stream Processing Engines (SPEs) cannot handle massive stream data efficiently due to their inability of horizontal computationmodeling and lack of interactive query. In this paper, we detail the challenges of stream data processing and introduce FastFlow, a model-driven infrastructure. FastFlow differs from other existing SPEs in terms of its user-friendly interface, support of complex operators, heterogeneous outputs, extensible computing model, and real-time deployment. Further, FastFlow includes optimizers to reorganize the execution topology for batch query to reduce resource cost rather than executing each query independently.


Introduction and Motivation
Tremendous and potentially infinite volumes of data streams are generated by real-time surveillance systems, communication networks, online transaction process, and other dynamic environments [1].Due to the high commercial and research value of these massive stream data, numerous Stream Processing Engines (SPEs) have been proposed to process those data, from both industry and academic community.Differing in their system architectures, operation models, and/or rule languages, many of these SPEs are converted to notable mature products during the last decade [2] (e.g., StreamBase [3], InfoSphere Streams [4], Storm [5], and Apama [6]).
These SPEs may offer very different functionalities because of their diverse purposes and focuses (e.g., stock market analysis, utility studies, process, and quality control), but due to unique characteristics of stream data all of them should meet the common requirements: one-pass scan without persistent storage of the whole stream and processing steam on the fly.Note that traditional DBMS (Database Management System) needs to store and index data before analyzing it and thus cannot fulfill the requirements very well [7].
Although there are a number of existing SPEs offering solutions to handle Complex Event Processing (CEP) problem, we argue that two main challenges require better solutions.
First, traditional SPEs (e.g., Esper, SQLStream, and StreamBase) are designed to handle medium-sized sequence records.Hence, the computation power of a single server is not enough when the volume of stream data is huge.As Figure 1(a) shows, a common solution for this issue is to use scaling-up strategy that partitions and replicates data to multiple servers.However, this scale-up solution makes it hard to reuse intermediate results because different logics are assigned to separate computing nodes and thus may waste computing resource for redundant computation.
Second, lack of interaction during the analytical process reduces the flexibility of these SPEs.For instance, Storm and Apache S4 only supply primitive operations to analysts and use Directed Acyclic Graph (DAG) to schedule execution logic.DAG is a computing model in which each edge terms a stream and each vertex is a primitive operator.Figure 1(b) depicts the DAG model.Using DAG model, business logics should be predefined and translated into machine understandable program by software developers and then deploy to Figure 1: Two implementation architectures of handling massive stream data.(a) terms scale-up model, which splits input stream into several substreams by hash function or manually.Analysts define business logic requirements as separated rules that are sent to operator nodes in queue.The relation of source, operator, and sink is (1-1-1), while (b) presents a horizontal distributed computing model, which treats rules as a set and shares stream data among multiple operators.Hence, the relation of source, operator, and sink is (1-n-n).we focus on mining the dynamically updated statistical mobile data of a user track system.More specifically, the system collects semistructured stream data flow continuously and incorporates essential data mining techniques such as association and characterization on real time.Since traditional database systems cannot handle these temporally ordered, fast changing, massive, and potentially infinite data [7], here, we present a novel stream data processing framework FastFlow, which consists of web UI model, XML formatted metamodel, and execution model.The unique features of FastFlow include enabling of changing business logic on the fly and better scaling of computing power.We will show that a SQL-user should be able to quickly grasp stream processing with ease.Current Situations and Challenges.We compare features of common SPEs with FastFlow in Table 1.Some engines (e.g., Esper, Drools) support changing rule on the fly but cannot deal with massive data very well due to their scale-up distribution designs [8], while other systems (e.g., Storm, Apache S4) can support easy-extension deployment but only offer primitive operators.In contrast, FastFlow supports both horizontal computing model and flexible logic definition.In this paper, our framework only supports some simple stream queries and operations, but it can be extended to handle more complex queries and rules.
Our Contributions.In this paper, we make the following contributions.
First, in FastFlow we supply user-friendly web UI for users to build and deploy business logic model on the fly; that is, FastFlow allows users define and modify business logic without programmers involved.
Second, we model the cost of primitive operations and achieve accurate estimation of global cost of given business logic.
Third, FastFlow contains optimizers (local optimizer for single query and global optimizer batch query) to effectively reorganize query execution topology to enhance efficiency.

Mobile Stream Data.
In this paper, we categorize the features of mobile data into two classes: demographic feature and activity feature.Demographic feature includes static content of mobile device such as device manufacturer (e.g., Samsung, Apple, and Xiaomi), smartphone models (e.g., iPhone 5S, Note 2, and Mi3), OS version (e.g., IOS 7 or Android 4.0), device screen size, CPU, RAM, and device price.Activity features which record mobile events include GPS location, various sensors' collected data, application list, connected net type, and currently running applications.Hence, activity features present a stream of up-to-the-minute activity across the device which reflects the user behavior [9].

Data Collection and Storage.
In SPEs once a stream data item is processed, it is usually discarded and cannot be reoperated since SPEs do not have enough space to store archive data.FastFlow works under the same requirement.In the context of mobile stream data, once the application status changed, SDK would generate an event record and upload to collection server.Millions of application events produce a huge amount of data streams continuously.Before sending data to FastFlow, we do dictionary coding and duplication removing in some distributed memory storage systems as Redis [10] and Kafka [11].

Model Driven
Architecture on Big Data.Model Driven Architecture (MDA) is used to describe the concepts and their relations for system framework [12].A system can be expressed by an abstraction model at different levels where each level presents certain aspects of the system.Object Management Group (OMG) [13] defines three different models of MDA: concept model, metamodel, and platform model.Concept model has the highest abstraction and is described in business viewpoint.Metamodel is platform independence level with technological viewpoint.Platform model gives the most detailed information of system implementation and generates execution code for various running environments.
MDA promotes the creation of machine-readable, highly abstract models that are developed independently of implementation technology.Several MDA related approaches are proposed for migrating system from traditional platform to cloud platform [9].Article [14] proposes a model-driven tool for analyzing real-time stream processing applications so that performance bottlenecks can be pinpointed.
The idea and mechanism of MDA are introduced in our paper.FastFlow accepts concept model by web UI, transforms model to metamodel with optimizers, and finally implements model on special storm platform.

Model-Driven FastFlow Framework
In this section, we give the overview of FastFlow.Following the model-driven architecture in Figure 3, we define all three Separate operator has single stream source as input and copies or shuffles it to several successors.
TimeNode.TimeNode (TN) in FastFlow defines the sliding time window for executing query model and manages finegrained event expiry.TN informs the engine how long each relevant query task should be retained and also improves the efficiency of cluster resource by discarding expired tasks.
CacheNode.The main purpose of CacheNode (CN) is to temporarily save data in local or distributed storage.Since some operators are aggregation based and ranking based (e.g., sum, count, and top-), CN saves and updates current event status tentatively and can be treated as a connection node among different ONs.
Operator nodes and their relations can be described as CM(, ), where  is OperatorNode, TimeNode, or CacheNode (i.e., select, where, and from) and  could be presented as an arrow from one node to its successor like ( start ,  end ).For example, a query "select userid from events where model = 'Samsung"' consists of three OperatorNodes: select (userid), from (events), and where (model = "Samsung").Select, from, and where are operators while objects in brackets are attribute items.One possible implementation of query is expressed in Figure 4, in which all three operators are linearly ordered.When a tuple of stream data comes, it will be processed through the first node to the last node.
Beside the above three main basic operators, we present a directed acyclic graph (DAG), formed by a set of vertices and edges, to model business logic in partial orders.A vertex presents a single operator and an edge shows constraint.DAGs could be used to generate processes in which data flows in a consistent direction through a network of processors.Besides taking operator functions into account, in our framework, we also consider the intermediate results reuse in DAG to reduce the cost of cluster resources.

Metamodel Structure and Persistence.
Metamodel could be defined by source, operator, and sink.In addition, we also give a description of cluster environment like number of nodes in cluster, the parallelism of node, and the throughput of sources and sinks.We choose Extensible Markup Language (XML) as our persistence model, which is structureformatted and easy to define business semantic by holders.A variety of XML tools and API have been developed and rich online documents and examples are easily to touch.Since metamodel is not platform dependent, it can exchange and transfer model to multiple platforms prevalently [15].In FastFlow, we describe metamodel in three folders: running environment, data structure description, and execution topology.Algorithm 1 shows the structure of metamodel in XML.
We use the element "configuration" to denote the feature of processing cluster.Element "executionTime'' defines the range of sliding window, that is, 30000 ms, while "execu-tionType" terms cluster or local mode to execute topology.Element "dataFormat" gives schema of stream data including data source connection information, data type, and attribute name.Finally, "operators" describes function type, storm grouping type, and operating parameter condition.

FastFlow Processing via MDA.
According to the components in MDA model, the overview of FastFlow framework processing includes four parts in Figure 5: user-friendly web UI, FastFlow engine, cluster environment, and data source.
Business analysts describe logic via web UI.Then the logic is transferred to metamodel.FastFlow engine accepts metamodel and parses it into operator nodes, applies optimizers to reorganize execution topology, and generates machine understandable execution code.The execution code is distributed to cloud cluster.Once the code is executed on the cluster, outputs are extracted continuously.Note that the local optimizer and the global optimizer included in FastFlow engine are designed to generate optimal execution topology.

Operator Cost Estimation and Optimization
In this section we focus on how to transfer metamodel to execution plan efficiently.We introduce how to measure resource cost for different operators and give two optimizers to improve efficiency of resource utility.

Estimate Costs of Operators.
Since user cannot access model transformation functions directly, we need an optimizer to accept metamodel, parse model, and find the best execution plan.For those data streaming environment with multiple nodes in scalable cluster like Storm or S4, data processing plan is treated as DAG and allocated to several connected nodes.The primary resources in cluster are available CPUs, memory, and network bandwidth.Hence, our optimizers aim to optimize the utility of these resources.
There are still two challenges left for optimizing plan.One is how to weight resource because a variety of applications are running in heterogeneous clusters.Different application administrators focus on different facets according to their available resources.For example, when the cluster is short of CPU, the efficiency of algorithm has more weight than I/O performance.In this paper, the term   is defined as resource of type .For example, we can present  CPU as CPU resource.Instead of treating importance of all resources equally, we give a special value for every type of resources by   .Moreover, for distributed cluster systems, if we want to monitor the utility of certain computing node, the term could be presented as  op  , where  is resource type and op is computing node.Another challenge is how to predict resource cost for different implementations.We introduce statistic-based approaches to estimate resources cost.Suppose that tuple in stream data has  attributes with type int, vchar, and so forth, we give naive estimation model and complex estimation model to predict field size (PFS) for projection operation (i.e., select).For naive estimation model, we only consider the number of attributes in tuples regardless of attribute type.For example, given an -attribute tuple, PFS for  selected field is /, while for complex estimation model, PFS takes filed type into account and defines formula as PFS(attr  ) = FiledSize(attr  )/ ∑  =1 FiledSize(attr  ).Furthermore, selection operations (i.e., where) can filter small parts of whole records and discard others.Therefore, the selectivity of a condition   is the probability that a tuple in the source  satisfies   .If   is the number of satisfying tuples in , the percentage of selected size (PSS) is   /  .For a set of conjunction selection operations  1 ,  2 , . . .,   , the estimate cost is given by formula PSS ( 1∧2⋅⋅⋅∧ (DS)) =  We treat reduction factor  op as PFS or PSS for special operator op and estimate global resource cost by formula gc = ∑ op∈⟨ops⟩ ∑ ∈⟨CPU,mem,bandwidth⟩  op ×  op  .If the topology has smaller gc, it has more efficient resource usage in high probability.

Equivalence Rule Optimization.
In order to optimize the execution topology, we should have some criteria to guarantee the function semantic stable during various implementations.Equivalence rule defines the situation of several execution processors that if we give the same inputs any processors then we would get the same result.For example, in query "select userid from events where model = 'samsung', " from (events) model points out that data source and others operators describe projection and filter functions.We could see that execution paradigms "from-select-where" and "fromwhere-select" would have the same outputs and meet equivalence rules.In this paper, we focus on single stream source with multiple queries for simple and define five equivalence rules, including the following.ER5 (equivalent exchange).Projection operation and filter operation could be exchanged in linear order as ∏    (DS) =   ∏  (DS).
The above five rules will be used in query transform process and be designed for reducing resource cost via alternative implementations.The similar idea could be found in traditional query optimizer [16]; however in this paper these rules do not only focus on single query processing but also intend to optimize multiple queries in cluster nodes.By using equivalent rules, we could translate business logic to multiple metamodels and estimate resource cost of Input: <operationModel>: operation model list  op : cost of cpu, memory and network for special operators  op  : weight for resource and operator Output: metaModel: the optimal execution way (1) Initialize (0-1)-matrix operatorMatrix( 1 ,  2 ) (2) metaModel ← <> (3) metaModel ← opM sliding window (4) vertex ← 0, minweight ← 0 (5) for all opM in operationModel do (6) if opM.type = source selection then (7) metaModel ← opM source (8) operationModel\opM source ( 9) else (10) vertex ← opM source (11) for all opM in operationModel do (12) if operatorMatrix(vertex, opM) = 1 and min{  , ℎ} < ℎ then (13) vertex ← opM and operationModel\opM and metaModel ← opM (14) ℎ ← min{  , ℎ} (15) else ( 16) continue (17) end for (18) end if (19) end for (20) return metaModel Algorithm 2: Local optimizer.each model.In this section, we present the algorithms of local optimizer and global optimizer.

Local Optimizer.
Local optimizer is responsible for generating optimal execution plan for single query task.As described in Figure 4, a single task could be expressed as a linear sequence with special order.Algorithm 2 is a method to estimate cost for given strategy and find the optimal way to implement business logic.
In Algorithm 2, we denote operators of each query as a set of tuples <operationModel>.We first generate opera-torMatrix, a (0, 1)-matrix according to equivalent rules, to express the connection of two operators.The sliding-window operator and source operator are put at the beginning (i.e., lines 3-10).For other operators, we connect them by taking both equivalent rules and estimated resource cost into consideration.We evaluate cost for each operator and put smaller cost node in the front (i.e., lines [11][12][13][14][15][16][17][18][19] such that most parts of data could be discarded earlier to release the resource.

Global Optimizer.
Global optimizer focuses on cluster performance by multiple queries.For each data stream, there are many queries for different business logic, which could be clustered to multiple sets by their operators.Queries in special set are more similar than queries outside.For instance, there are four query models Q1, Q2, Q3, and Q4 in a cluster, which are expressed in Box 1.According to operator nodes, these queries have common parts shared among each other (like select region or where nettype = "Wi-Fi").
For those queries, unlike traditional DBMS (MySQL, PostregSQL) and scale-up CEPs (drools, Esper) to treat queries independent and handle them separately, FastFlow integrates multiple execution paths and generates a global execution topology.We show three basic topology connection types in Figure 6: sequence, AND split, and AND join.Sequence supports both single query and multiple queries.AND split and AND join are suitable for merging multiple queries.Any topology with complex logic could be composed by these three basic models.
We present the global optimizer in Algorithm 3 which consists of four separate steps: extracting common operators, evaluating similarity of queries, local optimizer, and global optimizer.
We use Figure 7 to illustrate process of global optimizer.First, we extract common operators, which are shared in more than one query, from simple to complex models, categorize operators into three folders, and mark operator code from one to six.Then, we use Jaccard similarity coefficient [17], a Mobile Information Systems Q1: select nettype, region, model from events where nettype = "Wi-Fi" [Range 10 min] Q2: select nettype, region, model, user, receivetime, ip from events where nettype = "Wi-Fi" [Range 10 min] Q3: select nettype, region, model, user, receivetime, ip from events where nettype = "Wi-Fi" and ip contains "117.136"[Range 10 min] Q4: selectdistinct(user) from events where nettype = "Wi-Fi" [Range 10 min] Box 1: Four queries in certain cluster with high relativity.statistic method for comparing the similarity of different sets, to calculate the score of pairwise queries.Jaccard similarity (JS) could be expressed as formula JS(, ) = | ∩ |/| ∪ |, where  and  are two sets.As Figure 7 shows, set Q1 has code (1,5) and Q2 has code (1,2,3,5); thus the Jaccard similarity JS(Q1, Q2) = 0.5.We list all similarity scores for six pairwise queries in Steps 1-2.Next, Step 3 plans resource-cost optimal execution path for every query by Algorithm 2. In Step 4, we merge local optimal queries by the descending order of their similarity score.Note that execution order in global optimizer might be different with the order in local optimizer because we need to keep the query semantic for all queries in global topology.For example, in order to merge Q2 and Q3, filter operator (code 4) should be moved from the beginning node to the end node.Q1 changes the order of execution path as well for matching existing topology completely.

FastFlow Implementation
According to the three phases in the MDA architecture, the concept model is defined by queries, while the platform-independent model (metaexecution plan) is proposed by optimizers and is stored in structured format files as XML.Now, we focus on the last phase: how to implement execution plan to specific platform running code.Since Apache Storm is a free and open source distributed real-time computation system.It could be used for real-time analytics, online machine learning, continuous computing, and ETL.Moreover, it is scalable, fault-tolerant, and easy to operate and extend.In this work, we use Storm to set up FastFlow.
Figure 8 shows the class design of the FastFlow framework.Some classes inherit from Storm interface.FastFlow contains three main execution components which are also defined in Storm: ISpout, IBolt, and Execution Topology.ISpout defines the input source of system, IBolt processes stream data step by step, and Execution Topology describes global running plan for multiple given tasks.IBolt could be implemented by various operation types like monadic operator, combine operator, and separate operator.To clarify, TaskMetaModel is related to metadata model defined in XML configuration file.Kafka component is treated as storage node.The two optimizers are cooperated with Execution Topology.
All the generated codes are deployed to Storm environment for horizontal scale.

Experimental Conditions
Platforms.The FastFlow is set up on four-machine cluster, which installs CentOS 6.3 and JDK 7. The cluster includes Dell servers with 8 core Intel Xeon 2.27 GHz, 6TB-size hard disk and interconnected by 1 Gbs network.The RAM of servers is 63.0 GB.The version of Storm is 0.9.0.1.We build web queries interface by PhP and JQuery and use data-driven documents (D3) [18] to output results.
Resource Monitor.In order to monitor resource cost of processes in FastFlow, we introduce the Atop tool, an ASCII fullscreen performance monitor, which reports process related resources like CPU utilization, memory consumption, and network throughout.Atops are installed on all cluster servers with a program running to collect machine report every second for further analysis.
Note that Storm project cannot support resource monitoring for spout and bolt.In order to track resource of every bolt, we set multiple forts for different Storm workers and control the number of executors.Then, we could get worker status by its process id.In order to make the experiment nondiscriminatory, we utilize multiple complex operators and run FastFlow for one minute.

Estimate Primitive Operator Performance and Resource
Weight.We first analyze the overhead of primitive operator separately on a single server.We choose several common operators, like select, where, distinct, and groupby, to evaluate resource cost like CPU, memory, and network bandwidth.The resource monitor collects 60 tuples and the cost results are presented in Figures 9-11.In these figures, () means selecting  fields to project in query and () presents choosing  where conditions in our experiments.In Figure 9, under throughput 2000 tuples per second, with increasing the size of projection, we notice that CPU utilization stays around 70% and memory utilization is around 240 MB.It is obvious that CPU and memory have less influence by projection.However, the output is heavily related to the complexity of projection.The more attributes extracted, the larger output bandwidth needed.
Figure 10 shows the result of bandwidth cost for different filter conditions projection operators.We can observe that the decrease in efficiency of filter operator (e.g., where) is usually smaller than projection operator (e.g., select).Because filter operators prune tuples based on the distribution of data value, projection operators prune data by fixed structures and thus when merging an additional condition, the conjunction set size will decrease rapidly.For those advanced operators (i.e., grouping, regex rule, and distinct), we monitor their CPU and memory utilization in Figure 11.We find that the group operator is the most costly since it has to apply hash function on each item for allocation.
The experiments above estimate the resource cost of various operator types.Now we show the influence of stream data throughput.We vary steam data emitting ratio from 2000/second to 20000/second and present results in Figure 12.We can see that CPU usage increases with bigger the throughput, but memory usage is not affected much by varying throughput.
In order to set the resource weights of CPU, memory, and bandwidth, we should take into consideration both the above analytical experiments and the limitation of these resources in our cluster.We can draw the conclusion that Mobile Information Systems  According to the resource occupation percent above, we set the related weights for various operators.Setting value to tuple <resource, operator> is flexible.In real application, weights could be predefined by suitable rules or adjusted online by cluster current status.Table 2 shows the fixed value for our following experiments by using self-defined formula    = (slop + percent) × , where slop factor is used for data size increase impaction and  is fixed to magnify final value.

Local Optimizer Performance.
In this section, we report the performance of local optimizer.We also compare its effectiveness with normal execution model.Since the initial input stream and final output results are the same no matter which execution plan is used, we only focus on the internal bandwidth between operators.Taking Q 1-3 in Table 1 as examples and following the process in Figure 7, we could estimate the priorities of different queries and different execution plans.
Recall that there are two cost-based estimated models: naïve model and complex model.For naïve estimation, in our experiment, there are a total of 55 fields in each tuple with equal weight, while for complex estimation, the field type and definition should be considered.We calculate the cost of filed in Q1-3 as follows: nettype = 0.016, region = 0.016, model = 0.016, user = 0.052, receiveTime = 0.052, and ip = 0.082.We also give the distribution of data values, which could be collected by sample data.In our cases, the cost of "nettype = Wi-Fi" is 0.21 and the cost of "ip contains 117.136" is 0.14.For each query, we give two plan paths and evaluate score by both estimation models.Note that we use local optimizer to generate execution plan 1 but not for plan 2 (i.e., its query order is not changed).
In Table 3, we first run queries Q1-3 by two different strategies and treat execution plan 1 as optimal strategy and execution plan 2 as baseline strategy.For both execution plans, we apply naïve model and complex model to estimate resource cost.We find that the choices of optimal plan under different estimation methods are not always the same.In Table 3, the naïve model chooses plan 2 as the optimal one which is not correct for actual evaluation.Hence, the complex model is more accurate than naïve model, especially when the field types are diverse.Figure 13 presents the resource cost of three types for Q1, Q2, and Q3.It shows that the optimal execution plan delivers significantly better performance.

Global Optimizer Performance.
In order to estimate effectiveness of global optimizer, we first do experiments in Q1-Q4 with two open source tools: Esper and Drools which are used in Twitter and JBoss.These two tools are bestof-breed rule engines which also offer CEP and are easily integrated with java program.We install Esper and Drools rule engine in Storm topology separately, rewrite Q1-Q4 to their special language, and monitor CPU, memory, and bandwidth for one minute under the emit ratio 2000/second.From the results in Figure 14, the CPU utilization of Esper is higher than Drools.However, Drools uses more memory than Esper since it needs additional working memory to store temporary results.In terms of network, the input stream is copied and assigned to different rule engines for both models.Thus, the bandwidth utilizations from two systems are close for all queries.
For FastFlow, we set up execution topology with six atom operators as Figure 7 shows and evaluate the recourse cost of them.In Figure 15, we note that resource costs of operators are not balanced, which are highly related to its function as  well as its position in topology.For example, operator 5 needs more CPU and bandwidth than others because it needs to handle full dataset which is much bigger than other operators.In contrast, operator 4 is at the end of topology and only deals with only small part of data.Thus these factors should be considered to assign operators to proper machines.At last we compare FastFlow with Esper and Drools on different execution topologies.Based on Figure 7, we evaluate cost for two goals: (1) verify the performance and effectiveness of FastFlow and (2) test the influence of topology structure.The cost of Esper and Drools are the sum of costs of four queries in Figure 14 and the cost of FastFlow is the sum of resource costs on related operator nodes in Figure 15.
As Figure 16 shows, FastFlow has better resource utility than Esper and Drools in all three dimensions, especially in Step 1 and Step 2 where the topologies are the same which means that the additional query does not bring any burden to the whole system.

Related Work
Our work relates to various past efforts in scalable distributed computation, complex event processing, and MDA.Large Scalable Computing.Iterative computing models exist in data mining, information retrieval, and other computingintensive applications.For example, MapReduce and Dryad are two popular platforms to take dataflow by a directed acyclic graph operators [7].HaLoop modifies MapReduce model and supports iterative algorithms [15] and Hone, a "scaling-down" Hadoop, designs a novel mechanism to run program in single multicore high end sever [19].Projects over MapReduce like Apache Pig and Apache Hive aim at supporting aggregate analyses by high level language [20].Another recent framework such as Spark is treated as replacement of Hadoop.Spark proposes a novel structure RDD to process data in memory and achieves 10 times better performance than Hadoop in many use cases [21].
However, MapReduce is suitable for batch job and does not handle stream data as we study in this paper.
Complex Event Processing.Implementations of complex event processing have been documented in previous works.Wu et al. [22] introduce continuous query language and use native operators to handle these queries.Apama and Stream Insight [23] monitor moving event streams and detect significant patterns based on predefined rules.Moreover, several industry companies have their own CEP engine such as Esper, Tibco StreamBase, or Oracle Event Processing.However, these applications focus on complex business logic rather than computation model.So they are not suitable for massive data, while horizontal CEP applications like Storm and Apache S4 [7,24] achieve better scalability but only support basic primitive operators and cannot accept declarative and rule-based queries directly.Therefore, Storm and S4 suffer from programmers' involvement and unfriendly interactive model.
Model-Driven Architecture in Big Data.MDA is a useful tool to manage complexity, reducing the development effort required on software projects.MDA bridges the gap between the analysis and implementation.To our best knowledge, there is not much work done for MDA in Big Data.SciFlow [25] supplies an efficient mechanism for building a parallel application and enables the design, deployment, and execution of data intensive computing tasks on Hadoop.Article [26] monitors the behavior and performance of multiple cloud servers and assigns queries to suitable server to execute business logic.These systems are built on MapReduce model, so they do not aim to solve stream data problem.Research [27] proposes a model-driven tool for pinpointing the bottlenecks in real-time stream processing.But the StormML and SimEvent models in that research are manually created and are designed for estimating the performance of applications rather than business logic description.

Conclusions and Future Work
The era of the massive stream data is upon us, bringing with it an urgent need for advanced interactive data processor with low latency.In this paper, we present a framework called FastFlow, which supplies user-friendly interface, automatic model transformation, extensible scalability, and continuous outputs.The process chain of FastFlow consists of three phases: defining business logic model, planning optimal execution, and deploying and running on horizontal cluster.Moreover, from the resource utility perspective, we provide local and global optimizers in different levels, which aim to reduce the CPU, memory, and net I/O cost of cluster and share the results for multiple queries.In order to validate the effectiveness and efficiency of FastFlow, we have done some experiments and compared with existing CEP engines and built a prototype for mobile stream data.Many challenges in stream data processing need further research attention.FastFlow cannot support replacing and changing execution topologies on the fly.In addition to this, currently, we only provide single source queries, while in reality, it is common to face various stream sources.Besides SQL-like queries, machine learning and data mining approaches are also important for stream data and would be introduced to FastFlow later.

Figure 5 :
Figure 5: Overview of processing by FastFlow.It can be organized into three phases, including web-based business concept model definition, FastFlow core engine, horizontal scalability cluster, and various perspectives.
Design.FastFlow includes two kinds of optimizers, Local optimizer and Global optimizer, to deal with resources allocation at different levels.Local optimizer, which focuses on single query, reorganizes execution topology to reduce resources cost.Global optimizer tries to optimize the overall efficiency of multiple tasks, for example, by reusing the intermediate results across different tasks.

Figure 6 :
Figure 6: Define basic connection model in DAG based on their source and sink, including sequence, AND split, and AND join.

Figure 9 :
Figure 9: Resource cost for projection operator.Choose various sizes of projection and estimate CPU utilization, memory usage, and output bandwidth.

Figure 10 :
Figure 10: Bandwidth cost for projection operator and filter operator.

Figure 11 :
Figure 11: Comparison of resource cost for basic and complex operator types.

Figure 12 :
Figure 12: CPU and memory cost for different operator types under increasing throughput.

Figure 13 :Figure 14 :
Figure 13: Comparison of resource cost between normal execution plan and optimal execution plan.

Figure 15 :
Figure 15: Estimate resource cost for all operator nodes.
: <<operationModel>>: a set of operation list  op : cost of cpu, memory and bandwidth for special operation   : weight of different resources

Table 2 :
Resource weight of various operators in our experiment.

Table 3 :
Query execution plan and related priority scores of different models (the weights of operators are in Box 1).