An Efficient Data Analysis Framework for Online Security Processing

Industrial cloud security and internet of things security represent themost important research directions of cyberspace security. Most existing studies on traditional cloud data security analysis were focused on inspecting techniques for block storage data in the cloud. None of them consider the problem that multidimension online temp data analysis in the cloud may appear as continuous and rapid streams, and the scalable analysis rules are continuous online rules generated by deep learning models. To address this problem, in this paper we propose a new LCN-Index data security analysis framework for large scalable rules in the industrial cloud. LCN-Index uses the MapReduce computing paradigm to deploy large scale online data analysis rules: in the mapping stage, it divides each attribute into a batch of analysis predicate sets which are then deployed onto a mapping node using interval predicate index. In the reducing stage, it merges results from the mapping nodes using multiattribute hash index. By doing so, a stream tuple can be efficiently evaluated by going over the LCN-Index framework. Experiments demonstrate the utility of the proposed method.


Introduction
Industrial cloud security service has drawn increasing attention in recent years. A wide spectrum of industrial applications and online industrial control business are using cloud-fog computing as their fundamental solution for the unprecedented amount of data problem [1][2][3][4][5].
Despite the successes of traditional cloud, existing traditional cloud protecting services were merely focused on designing scalable inspecting techniques for static block data in the cloud. For many emerging online industrial protection applications in the cloud, data often come in the form of multidimension continuous temp tuple streams, and it is urgent to develop scalable stream-based inspecting techniques for cloud security computing. Example 1. Let us consider an online traffic management and inspecting system as shown in Figure 1. e essential goal of the system is to analyze security surrounding traffic information for all connected users. In this system, on the one hand, all traffic data are monitored by on-street cameras and uploaded to analysis nodes in the cloud; on the other hand, all the connected users will get online continuous surrounding traffic security information from the cloud by inspecting all queries. Note that different users may get different levels of security services based on their rules or queries.
(1) e online analysis queries may be very large and complex in the cloud. For example, in the job recommendation application, there are usually more than one million job applicants and each applicant may have more than a hundred attribute items. (2) e query set has a dynamically changing nature. For example, in the online web monitoring, web masters may need to add new queries to the query set and delete outdated queries. (3) A timely response is demanded for all queries, even though the stream data are very complex. For example, in the web monitoring application, it is often the case that the tuples have a size of more than 40 bytes with flowing speed faster than 10 6 tuples per second, and the system is supposed to return all the tuples that match the monitoring queries.
In front of the above new characteristics of stream-based query in the cloud, how can we efficiently evaluate all the upcoming tuples with respect to all the registered queries? Traditionally, in data stream query systems, a centralized indexing structure will be built on a master server. After that, for each upcoming tuple, the system traverses the centralized index structure to answer the queries. However, such a centralized method cannot be used in our new problem setting. is is because the size of the indexing structure increases with the query number. Besides, there will be also a bottleneck of the system response. erefore, in the cloud systems, we are unable to build a single centralized indexing structure to unify all the scalable and complex queries.
On the other hand, in the cloud, the conventional distributed methods are also impractical. is is because we may need to frequently register new queries (or delete outdated queries) in the cloud, which makes it very difficult to decide how much computing power should be assigned to the cloud. Conventional distributed methods have the obvious downside of lacking elasticity.
Given the limitations of existing methods, in order to solve the stream query problem in data streams, the following three challenges should be addressed: (1) Scalability: traditional data stream processing studies [6][7][8][9][10] usually make an assumption that the query number is no more than two thousand, whereas query number often exceeds a million in the cloud. (2) Elastic compute power: traditional distributed stream processing solutions [6][7][8][9][10] usually predefine the number of computing nodes, whereas it is very difficult in the cloud, because of the dynamic changing nature of query numbers. (3) Real time processing: it is necessary to process all the queries in a real time manner.
In light of the above challenges, in this paper we design a new LCN-Index data stream framework for online security analysis in the cloud. LCN-Index uses the MapReduce computing paradigm to deploy all the continuous queries. In the mapping stage, it decomposes each attribute into a batch of predicate sets which are then deployed onto a mapping node using interval predicate index. In the reducing stage, it merges the intermediate results from all the mapping nodes using multiattribute hash index. If stream processing overload is detected, the master will request more nodes from power provider. By doing so, a stream tuple can be efficiently evaluated by going over the LCN-Index framework. Experiments demonstrate the utility of the proposed method. e rest of the paper is organized as follows. Section 2 introduces the LCN-Index framework. Sections 3, 4, and 5 theoretically study the workflow, structure, and key analysis procedure of LCN-Index. Section 6 introduces the related work, and Section 7 conducts experiments and comparisons to demonstrate the effectiveness of LCN-Index. We conclude the paper in Section 8.

The Workflow of the LCN-Index Framework
e essential goal of the paper is to develop an efficient index framework that can support scalable and complex query in the cloud.

3.1.
Architecture. e overall system architecture is shown in Figure 2. In an offline process, before cloud stream querying, continuous queries are decomposed and indexed. e query set decomposition module is responsible for decomposing all queries into different predicate sets according to attributes. Predicate sets are then indexed by the Mapper index builder and Reducer index builder. e Mapper index builder is responsible for building interval predicate index (LCN-Index). e Reducer index builder is responsible for building multiattribute index. Our approach works with any existing scheme for indexing interval predicates, e.g., [11,12]. During runtime, given a coming tuple, the LCN-Index of Mapper is used to retrieve the matching predicates, and the multiattribute index of Reducer is used to retrieve all satisfied queries by merging all matching predicates. In addition, for every coming tuple, the job of Mapper Evaluator is to retrieve the matching predicates using LCN-Index. e job of Reducer Evaluator is to verify if any query can be satisfied from predicates retrieved by Mapper Evaluator. Section 3 describes the structure of LCN-Index in Mapper, whereas Section 4 describes the efficient merging algorithm in Reducer by using multiattribute index. Figure 2 shows the processing flow of cloud stream. More specifically, we divide query set Q into predicate sets and build LCN-Index based on segregated predicate sets.

Workflow.
ese LCN-Indexes are used by Mapper Evaluator in the map nodes ( Figure 3). We also build multiattribute indexes based on the mapping information between queries and predicates. e multiattribute indexes are used by Reducer Evaluator to retrieve all satisfied queries. For every coming stream tuple, we integrate the process procedure and MapReduce model. More specially, Our driving applications are cloud stream querying applications, which have to support millions of continuous queries and billions of tuples a day. To solve scalability, we decompose query set into independent predicate sets, which make the indexes distributed into the cloud easily. To solve elasticity, Map and Reduce nodes keep performance status heartbeats with master server. If stream processing overload is detected, the master will request more nodes from power provider (1 in Figure 1). Otherwise, the master will release the underusage nodes. To solve latency, for every incoming tuple, we combine the stream querying procedure with the MapReduce [4] model. MapReduce offers the capabilities of a massive and efficient key/value process framework. As mentioned above, the key idea that speeds up every incoming item's querying procedure is to combine Map-Reduce [4] model with our index framework.
Below, the content of Section 4 describes the details of LCN-Index. Section 5 describes the efficient merging algorithm of Reducer Evaluator by using multiattribute index.

The Index Strategy
In this section, we will first introduce the basic workflow and index schema of LCN-Index in our cloud data stream querying system in Section 4.1. Specifically, our focus is the Search and Insert operations. erefore, we discuss the Search operation in Section 4.2. Insert and Delete operations are also introduced in the appendix.
In cloud data stream querying problem, a large scale of continuous range queries can be registered against a data stream. Usually, an efficient main memory-based index is needed, especially if the cloud stream is rapid. We propose the LCN-based index for efficient processing of continuous queries in a cloud stream environment. e LCN-Index is centered around a set of predefined virtual containmentencoded intervals. e intervals are used to decompose predicate intervals and then preform efficient Search  Figure 3: Online data analysis framework: We integrate MapReduce model with query indexes strategy to efficiently support cloud data stream querying. In particular, in our index framework, we build LCN-Index on Mapper and build multiattribute indexes on Reducer. We focus on the highlighted components, local LCN-Index in Mapper and multiattribute indexes matching in Reducer. In Section 3, we will introduce LCN-Index schema. In Section 4, we will introduce the multiattribute index algorithm.
operations. In fact, LCN-Index is motivated by the CEI-Index [12]. e major differences between them are as follows: (1) LCN-Index mainly has enhanced search capability, especially supporting all predicates with quality expression, whereas CEI-Index is designed to only index simple interval predicates (2) e problem of CEI-Index [12] focuses on the queries with single interval predicate, whereas LCN-Index is designed to index complex queries whose WHERE clause is a conjunction of interval predicates e native method for indexing the interval predicates is comparing all predicate's boundaries maintained in the index. Time cost of this method is usually o (log n). n is the number of predicates. In contrast, we use the indirect indexing approach to index the interval predicates. It is based on the concept of Standard Interval Unit (SIU). We predefine and label a set of Standard Interval Units. We decompose each predicate into one or more SIUs. We give every predicate a sole ID, named PredId. We insert the PredId into the ID lists associated with the decomposed SIUs. Given a target point value X, the search procedure is very simple; we conduct the search procedure indirectly via the SIUs. It is needless to compare value X with any predicate boundaries. rough the predicate interval decomposition, the search result is union of all SIUs' ID list. We prove that only a small number of SIUs cover random value X. erefore, the search time is independent of predicate number. In the rest of this paper, we assume that the query is the conjunction of predicates, and we assume that the predicate is the interval predicate of Int type. Any other type could be transferred to the Int type. Figure 4 shows an example of local ID labeling corresponding to an interval predicate. Assume that the range of continuous attributes A is [0, r). First, we partition r into r/L segments of length L, where L is a power of 2. Every segment is denoted as S i , where i � 0, 1, . . . (r/L − 1). Here, we assume that r is a multiple of L. If not, it is easy to expand r. e value range of segment S i is [iL, (i + 1)L]. We treat boundaries of segments as guiding posts. For every segment, we define a 2L − 1 Standard Interval Unit (SIU) as follows:

e Structure of LCN-Index.
(1) Build 1 SIU of length L, corresponding to the entire segment (2) Build 2 SIUs of length L/2 to partition the segment into two pieces (3) Build 4 SIUs of length L/4 to partition the segment into four pieces (4) Build the partition process until the length of every SIU is 1 For example, there is 1 SIU of length 8, 2 SIUs of length 4, 4 SIUs of length 2, and 8 SIUs of length 1. All 2L − 1 SIUs are defined to have a special relationship among them. e SIUs with length 1 are contained in SIUs with length 2, which are in turn contained in SIUs with length 4 and so on.
In this paragraph, we introduce the labeling process on these SIUs of one segment. Every SIU has a unique ID which is composed of two parts: the segment ID and the local ID. Every segment is assigned a unique ID as a global identifier among all segments. e segment ID of segment S i , where i � 0, 1, . . . (r/L) − 1, is simply defined as l + 2iL, where l is the local ID. e local ID assignment follows the labeling of a prefect binary tree. e SIU with length L is assigned to 1.
e SIUs with length L/2 are, respectively, assigned to 2 and 3. Figure 4 shows the assigning process of local ID in one segment. Note that we assign 2L local IDs in every segment.
In this way, all SIUs in the same segment are organized as a prefect binary tree. e SIU with the local ID 1 is the root node of this tree, which contains two child SIUs with length of L/2, respectively; all leaf nodes have length of 1. e ID list structure in all leaf nodes not only store the ID of predicate, but also store the identifying label that indicates whether this predicate is a equality predicate. Figure 5 shows the prefect binary tree of one segment in Figure 4. e leftmost leaf node in Figure 5 shows the ID list structures. We can easily determine whether a satisfied predicate is equality predicate through the label corresponding to the predicate ID. e perfect binary tree in one segment has many efficient properties which make the Search operation in LCN-Index more efficient. We will introduce the searching and inserting algorithm in the next subsection.

e Search Operation on Mapper.
e Search operation is used to efficiently find all satisfied predicates for every real time attribute/value pair coming. Algorithm 1 shows the details of Search algorithm. For every attribute/value pair (a, v), where a is used to denote the attribute ID and v is the value, the Search algorithm first computes the segment ID using en, the algorithm uses (2) to compute the local ID of leftmost unit-length SIU.
Based on the property of perfect binary tree, we can simply check exactly (k + 1) SIUs that overlap data value v. Hence, the search results are merged into ID lists of these (k + 1) SIUs. We can simply locate the (k + 1) SIUs by  Figure 4: Example of the labeling on a predicate interval. e unitlength is L � 2k. We divide the interval into unit-length segments.
In every segment, we build 2k − 1 SIUs. e local ID of all these SIUs is from 1 to 2 k + 1 − 1.
dividing the unit-length local ID by 2. e Search algorithm is efficient and easy. We speed up the Search algorithm by translating all complex floating points into integers. e dividing of local ID is finished by logical shifts. e Search algorithm is independent of the indexed predicate number. Figure 6 shows an example of Search algorithm with input value (a/v). Our algorithm first computes the local ID of the unit-length SIU that overlaps v. In this case, it is S 5 as the k is set to 2. en, we compute all local IDs of the left k SIUs. In this case, they are S 2 and S 1 . Last, we compute the search result by merging all ID lists of these three SIUs (S 5 , S 2 , S 1 ). Figure 7 also verifies that the results indeed contain P 1 , P 2 , P 3 .

Data Stream Query Processing
In this section, we show how to merge all search results of LCN-Indexes corresponding to different attributes. In this section, we focus on the Merge algorithm in Reducer. Multiattribute index can be applied to process intermediate search results merging. Figure 2 shows the details of data stream querying workflow. For every coming stream item, we divide the item into individual attribute/value pairs. e key of every pair corresponds to the attribute contained in the stream. As shown in Figure 2, we integrate MapReduce programming model with the data stream processing, which split every stream item's process into two phases: Map

e Scheme of Index in Reducer.
e two most important schemes in the Reduce are the following: (1) selecting most common equality or inequality predicates as the trigger predicates; (2) building multiattribute index based on all these trigger predicates and mapping all these predicates with queries. More precisely, given a set of queries Q and the attribute set contained in the stream C, the attributes in C can be divided into two classes: (1) discrete and (2) continuous. First, we select all predicates of discrete attributes as predicate set. en, we cluster all predicates into different sets according to the attribute and its popularity. In the end, by using multiattribute hashing function, we build indexes based on these predicate sets. For every coming intermediate result from Mapper, merging incurs a lookup per hash table of the multiattribute indexes to find the trigger predicates.
We consider trigger predicates defined as a conjunction of equality or inequality predicates. A trigger predicate is defined by a pair <id, pred>, where id is an identifier and pred is a set of e-quality or inequality predicates which are pairwise different over their attributes. e set of attributes occurring in the pred is called Hash Combination. Let TP be a set of access predicates. In order to test these predicates against incoming events of stream item we use multiattribute hashing function to build indexes. Each index is intended to check trigger predicates having a certain schema. More precisely, a multiattribute index over a set of predicates is defined by a pair <A, h>, where A is a set of attributes that have equality predicates and h is a hash function which takes the coming event and returns the trigger predicates entry.

e Merging Algorithm in Reducer.
e Reduce phase is responsible for merging all intermediate search results from Mappers. e merging algorithm in Reducer uses a set of multiattribute indexes, a predicate result bit vector, an event list, and a vector of references to queries cluster lists, called a query cluster. e data structures that are used in merging algorithm are depicted in Figure 8. e multiattribute index is used to compute the set of queries satisfied by a given incoming search result from Mapper. We build multiattribute index based on the Distribute attribute whose predicates are registered with equality expression. Equality predicates shared among one or more queries are selected to be inserted into the multiattribute index. We call all these equality predicates trigger predicates. A trigger predicate p is associated with a list of queries clusters. When triggering an equality predicate, we need to check every query in the queries clusters associated with the trigger predicate. e multiattribute index is used in the merging algorithm which will be introduced in the next subsection. e predicate result bit vector is used to record the result of all predicates. e query cluster is used to check all satisfied queries sharing the same predicate. At the Reduce phase, we first build multiattribute index and deploy it in Reducer. en, for every coming intermediate search result from Mapper, we check its attribute ID and search in the multiattribute index to check all query clusters. If all predicate results of a query in query cluster are true, we say this query is matched and output its ID. Figure 8 provides a detailed description of a query cluster for queries having the same equality predicate p. A query cluster is a vector of a collection of query structures. e query structure of each query is organized as follows: it consists of a collection of all predicate results and a bit denoting the query identifier. Entry[i, j] of the query cluster contains a bit vector reference to the i t h predicate of the j t h query in query cluster. If all bit vector entries referenced at column j are true, we say the j t h query in query cluster is true. e most important problem is how to merge all intermediate results from Mappers. Algorithm 2 gives details of the merging process of all attribute/value pairs. As described above, the multiattribute index is built based on all equality predicates of discrete attributes. e discrete attribute/value pairs of one stream item are directly dispatched to Reducer node according to their stream ID. e discrete value pair is searched in the multiattribute indexes to trigger predicate inspection. Algorithm 2 shows the details of merging all predicate results from Mappers. We denote every coming predicate search result as an event e. e merging algorithm is executed each time a new intermediate search result comes in. First, the predicate result bit vector is initialized to "false." en, the merging algorithm begins a two-step procedure. e first step uses the multiattribute indexes to compute the satisfied trigger predicates, the algorithm sets to true all corresponding bits in the predicate result bit vector. We say event e satisfies a query q if the status of every predicate in q is satisfied after trigger event e coming by. erefore, the merge result problem is as follows: given a set of predicate search result events e and a set of queries Q, find all queries that satisfied the event set. e algorithm data structures are depicted in Figure 6. Recall that a query q is defined by an ID and a set of predicates. An event is an instance of e. Algorithm 3 shows the whole procedure of Reduce function. Firstly, we use the relationship between query and predicates to cluster all queries for every trigger predicate. A predicate p may also be associated with a reference to a list of query clusters. We say predicate p is a trigger predicate for all queries in query clusters lists of p. We guarantee that queries in cluster list triggered by p need to be checked if and only if p is satisfied. Inside the cluster list, queries are grouped into queries clusters by size. Secondly, for every coming event e, we execute the result merging algorithm. e result merging algorithm first inspects the event ID; if it is the first predicate search result of a stream item, we allocate a data structure just like in Figure 7 for this new stream tuple and initialize all bit vectors of predicates to 0. irdly, for current trigger predicate ID, we inspect all queries in the associated query clusters. If we find all predicates of any query result true, then we add the ID of all these queries to the output (Algorithms 4 and 5).

Related Work
is paper complements ideas developed in cloud data management, Publish/Subscribe systems, and data stream querying.

Cloud Data Management.
In the cloud in [2,3] proposed a data management system called epiC to build scalable data storage system. However, our work is different from theirs: our work focuses on the cloud stream querying problem where large scale queries are continuous, whereas their work [2,3] focuses on analytical jobs on large scale dataset in the cloud. Recently, there exist some works on cloud stream processing, e.g., [13,14], while [13] tried to combine the MapReduce with the stream processing in IBM's system S.
ey propose the DEDUCE, as a new middleware to support MapReduce model. In the DEDUCE, they provide language to support cutting the stream processing data-flow into MapReduce procedures. However, their work is different from ours. In particular, our work focuses on indexing scalable continuous queries to speed up the stream's querying procedure, whereas DEDUCE focuses on cutting simple workflow into MapReduce procedures. Reference [14] proposes a new processing framework to support large scale data streams in the cloud, while focusing on how to split the queries to support parallelization instead of indexing these scalable queries.

Publish/Subscribe
Systems. Publish/Subscribe systems are an active area of research [11,15]. Subscriptions expressing subscribers' interest in events are continually evaluated against publications representing events. e approaches are distinguished by the data formats they process and by the algorithmic design. Common among the approaches is the determination of a match based on the publication processed. A Boolean expression uses two types of primitives: ∈ and < predicates, and queries in Publish/ Subscribe systems are often a disjunctive normal form (DNF) or conjunctive normal form (CNF) of Boolean expressions, which is different from our queries with expression SELECT * FROM * WHERE * .

Data Stream Querying.
For each up-to-date stream record, data stream querying model traverses all its continuous queries to verify the record's key/value pairs. Reference [7] aims to speed up the join operator for every incoming stream item; in their STREAM system, they always assume that the number of queries does not exceed 2000, so it is impractical to use their methods to solve cloud stream querying problem. However, a line of indexing methods during the last few decades have been proposed to index texts, images, and microclusters on data streams for anytime query and clustering, e.g., [12,[16][17][18][19]. In particular, [16,17,19] focus on multidimension index strategy, and [12] focuses on only interval index on one single attribute. However, none of the existing works considers the problem of indexing for scalable cloud stream querying. Our work can be taken as a pioneer work on this direction.

Experiments
In this section, we will conduct extensive experiments on both synthetic and real world data streams to evaluate the performance and scalability of our index framework for each Journal of Computer Networks and Communications up-to-date stream record in the cloud. Our testing infrastructure includes 16 Hadoop machines which are connected together to simulate cloud computing platforms. e communication bandwidth between nodes is 1 Gbps. Each machine has a 3.00 GHz Intel Core 2 CPU, 4G memory, and a 500G disk. Machines ran Red Hat application server 5.2 OS. Different sizes of cloud computing systems can be simulated by our infrastructure. We conducted 10 simulation experiments, ranging from 100 nodes to 1500 nodes. Each time, 100 nodes are considered to be added to the cloud computing system. In our index framework, we use one machine to play the role of master and dispatch different attribute/value pairs of cloud data stream. Each of the other 15 machines simulates 100 to 1500 nodes.

Benchmark Data.
In order to test the efficiency of the LCN-Index framework, we used three real world datasets crawled from Internet. Table 1 lists the information of datasets. In particular, the stock dataset is crawled from stock-analysis websites (http://www.econ.yale.edu/shiller/ data.htm), which is used to simulate cloud stock stream monitor applications. e spam detection and malicious URL detection datasets are crawled from application-level routers to simulate cloud web traffic monitoring applications. All of our queries are generated using Zipf distribution, which is well known as a good fit for keyword popularity in text-based searches. With Zipf distribution, the popularity of the i th most popular predicate is inversely proportional to its rank i, i.e., p i oc 1/i a . e query number of all three real world datasets is 10000000.

Benchmark Methods.
For comparison purpose, we implement a distributed R-Tree index described in [17]. We use "DistributeRTree" to denote the index. In the Distrib-uteRTree, every query is taken as a multidimensional matrix and inserted into the DistributeRTree. e DistributeRTree maintains a large R-Tree over the network. However, the search cost of DistributeRTree may be much higher for stream processing when we insert more queries. erefore, the total number of queries of DistributeRTree is less than that of our system.

Measurements.
Two important measurements will be used here. (1) Time cost. By using an index framework that integrates LCN-Index and MapReduce to divide the query index according to attributes, LCN-Index is supposed to achieve a much lower computation overhead, whereas the DistributeRTree is based on the idea of conventional data stream processing procedure, which is supposed to achieve a much higher cost than LCN-Index. (2) Scalability. As the framework of LCN-Index divides the query set into different predicate sets according to attributes, LCN-Index supports more scalable queries than DistributeRTree.

Experimental Results.
We compared the two index strategies under different parameters. For example, different query numbers n, different nodes, different attributes, different L, and different query width w. Unless otherwise mentioned, the parameters are set as follows. e default query number is set to 100000. e default node number is set to 100. Figure 9 shows performance comparison with DistributeRTree index strategy under different query scale on real world data stream sets. Query number is the most important parameter to evaluate the performance. By comparison on these datasets, our LCN-Index always outperforms the DistributeRTree. When the query scale is enlarged, throughput of DistributeRTree obviously degrades as the property of R-Tree search cost. In LCN-Index, a stream item is processed by different Mapper in parallel, while in the DistributeRTree, we cannot apply the parallel Search algorithm, because the query is taken as matrix and randomly distributed in the node cloud. erefore, it is obvious that LCN-Index will significantly support the large scale continuous queries.

e Impact of Attributes.
For cloud data stream querying problem, the Reducer of LCN-Index only indexes equality or inequality predicates associated with the Distribute attribute, so the problem is the impact of discrete attributes and continuous attributes. To answer this question, we conducted a series of experiments with different attributes numbers. Figure 7 shows the impact of attribute number. From the results, we can observe that the LCN-Index framework will degrade as we increase the discrete attributes, because we need to do more index scanning in the merging algorithm of the Reduce phase, whereas the number of continuous attributes does not impact the performance of LCN-Index, because our LCN-Index absolutely supports parallelization.

e Impact of Query Width w.
To investigate whether the width of query impact LCN-Index efficiency, we compare the LCN-Index and DistributeRTree framework with some famous standard datasets. From Figure 10, we can come to important conclusions: (1) LCN-Index can significantly reduce the stream querying cost. For example, in the spam detection data stream, when the width w equals 7, LCN-Index needs 487146 ms to process a stream item, while the DistributeRTree needs 1397197 ms. (2) As we increase the width w, the querying costs of LCN-Index and Dis-tributeRTree both increase, but the cost of DistributeRTree increases more quickly than that of LCN-Index.
is is because the LCN-Index of one Mapper only maintains predicates of query set that belong to the same attribute, and the increasing width w only impacts the merging algorithm in Reducer. In contrast, as the width increases, Distrib-uteRTree's cost increases faster because more nodes have to be split and deployed in the network, which makes the Search algorithm more costly.

e Impact of L.
In this part, we compare the impact of L on three datasets. We use L to denote the length of segment in LCN-Index. All LCN-Indexes are built and Input: query set Q, stream S Step 1: partition Q into a batch of predicate sets P.
Step 2: building interval indexes and multiattribute indexes based on P.
Step 3: deploy these indexes on Mapper and Reducer Step 4: for every incoming tuple t ← stream S while t! � empty do   becomes bigger. is is because we need to inspect more SIUs's list when L increases. e M values of the LE-Tree and GE-Tree methods were set to be 30. From the results, we can observe the following: LCN-Index performs better than DistributeRTree. For example, in the Syn-10 dataset, LE-Tree is nearly three times faster than DistributeRTree. erefore, we can safely say that, compared to Distrib-uteRTree framework, LCN-Index is more suitable for cloud data stream querying. It is obvious that the performance of LCN-Index, by indexing the query according to attribute to support parallelization, will be more scalable than that of DistributeRTree which simply distributes the R-Tree in the network for processing.

Conclusions
Industrial cloud security is a new challenge. is paper presents a new elastic cloud data analysis system that supports scalable multidimension continuous queries inspecting. In online data analysis framework, we proposed a new indexing schema to efficiently process every incoming online data tuple. e key idea of the data analysis framework is to integrate MapReduce model and industrial communication tuple filtering procedure. Experiments on both synthetic and real world industrial streams show that our online data analysis framework is efficient, elastic, and scalable.

Data Availability
No data were used to support this study.

Conflicts of Interest
e authors declare that they have no conflicts of interest.