Subgraph-Indexed Sequential Subdivision for Continuous Subgraph Matching on Dynamic Knowledge Graph

Continuous subgraph matching problem on dynamic graph has become a popular research topic in the field of graph analysis, which has a wide range of applications including information retrieval and community detection. Specifically, given a query graph q, an initial graph G0, and a graph update stream△Gi, the problem of continuous subgraphmatching is to sequentially conduct all possible isomorphic subgraphs covering △Gi of q on Gi (�G0 ⊕△Gi). Since knowledge graph is a directed labeled multigraph having multiple edges between a pair of vertices, it brings new challenges for the problem focusing on dynamic knowledge graph. One challenge is that the multigraph characteristic of knowledge graph intensifies the complexity of candidate calculation, which is the combination of complex topological and attributed structures. Another challenge is that the isomorphic subgraphs covering a given region are conducted on a huge search space of seed candidates, which causes a lot of time consumption for searching the unpromising candidates. To address these challenges, a method of subgraph-indexed sequential subdivision is proposed to accelerating the continuous subgraph matching on dynamic knowledge graph. Firstly, a flow graph index is proposed to arrange the search space of seed candidates in topological knowledge graph and an adjacent index is designed to accelerate the identification of candidate activation states in attributed knowledge graph. Secondly, the sequential subdivision of flow graph index and the transition state model are employed to incrementally conduct subgraph matching and maintain the regional influence of changed candidates, respectively. Finally, extensive empirical studies on real and synthetic graphs demonstrate that our techniques outperform the state-of-the-art algorithms.


Introduction
e problem of subgraph matching is one fundamental issue in graph search, which is NP-Complete problem [1]. Specifically, given a query graph q and a large data graph G, the problem of subgraph matching is to extract all isomorphic subgraphs of q on G. In real world, data is usually emerged as a streamlined feature in social networks, which is formed as a graph stream. Recently, continuous subgraph matching on dynamic graph has become a popular research topic in the field of graph analysis, which has a wide range of applications including query answering [2], information retrieval [3,4], and community detection [5,6]. Specifically, given a query graph q, an initial graph G 0 , and a graph update stream △G i , the problem of continuous subgraph matching is to sequentially conduct all possible isomorphic subgraphs covering △G i of q on G i (�G 0 ⊕ △G i ). In this paper, we study the continuous subgraph matching on a special graph structure of knowledge graph (KG-CSM).
Despite the complex multigraph characteristic of knowledge graph and the polynomial-time complexity of continuous subgraph matching [1], recent existing research studies have made significant advances in developing computational paradigm of KG-CSM.
One aspect is to storing and indexing RDF triple data based on relational approaches. Weiss et al. [7] and Pérez et al. [8] employed an index-based solution to storing triples directly in an index of B + -tree over multiple redundant 〈s, p, o〉 permutations. Abadi et al. [9] vertically partitioned the RDF triples into a set of tables bounded by the labels of patterns and used an index structure on top of it to locate the required tables. Broekstra et al. [10] were based on the idea of graph database and abstract concepts of RDF triples with multiple properties. e same pattern matching strategy was used to provide a pattern selectivity approach, which can determine the search space for data tables. is strategy used a tree-pattern structure to filter RDF data into tables, which stored partial operated data units. en, the partial operated data units were incrementally joined by searching the treepattern structure. However, relational approaches result in extensive indexing and data preprocessing because the approaches are coupled with sophisticated statistics and highly joining depth and query-optimization techniques.
Another aspect is to resolving the recalculations of matches with the aid of intermediate results.
e incremental solutions have been employed in a variety of applications [11][12][13].
e solutions aim at the incremental strategies for generating results without incurring the expensive cost of recalculated data resources. However, most incremental methods are approximate algorithms based on relaxed graph simulations and only work for small numbers of graphs. And the incremental solutions are hard to be presented in the context of KG-CSM because of the inherent complexity and large-scale nature of knowledge multigraph structure.

Challenge 1: Multigraph Characteristic of Knowledge Graph Intensifies the Complexity of Candidate Calculation.
Knowledge graph is a directed labeled multigraph having multiple edges between a pair of vertices, each vertex represents an entity with attributes and each edge denotes an interentity relationship. Considering the model of knowledge multigraph in Figure 1, it is composed of attributed and topological structures. e attributed structure describes the attribute and type of entity, where attribute is taken as the label of edge coupled with a value and type is taken as the label of entity. e topological structure describes the relationship between a pair of entities and some relationships are coexistent, e.g., partnerships and couple relationship between persons. e multigraph characteristic of knowledge graph leads to a more dense adjacent structure than general graph, and it brings a new challenge to the research of KG-CSM problem. Furthermore, KG-CSM problem still contains the traditional challenge on general graph.

Challenge 2: Subgraph Isomorphic Mappings Covering a Given Region Are Conducted on a Huge Search Space of Seed
Candidates.
e traditional challenge on general graph is that the isomorphic subgraphs covering a given region are conducted on a huge search space of seed candidates, which causes a lot of time consumption for searching the unpromising candidate. Considering query graph q and data graph G in Figure 2, an edge (v 10 , v 11 ) is inserted into G. An isomorphic subgraph is defined as a subgraph isomorphic mapping and conducted as (〈v 1 , u 1 〉, 〈v 4 , u 2 〉, 〈v 10 , u 3 〉, 〈v 5 , u 4 〉, and 〈v 11 , u 5 〉). e basic strategy is to search the global space of G without the reduction of unpromising vertices v 2 , v 3 , v 6 , v 7 , and v 9 .

Contributions.
ree empirical studies motivate us to develop an efficient subgraph matching method on dynamic knowledge graph. e first empirical study demonstrated [5,14] that the tree-based index can reduce the noncandidates of dynamic graph by the influenced analysis of anchored and followed relationships. e second empirical study [15] demonstrated that the sequential technology can effectively limit the search space of graph update stream. e third empirical study [16] was our prior research of subgraph index on static knowledge graph, which demonstrated that the subgraph index can effectively accelerate the subgraph matching on static knowledge graph. In this paper, we propose a method of subgraph index-based sequential subdivision to accelerating the continuous subgraph matching on dynamic knowledge graph. Our contributions are described as follows: (1) We develop a flow graph index to pruning the noncandidates of query vertices on topological knowledge graph. e flow graph index is defined as a flow graph (FG), which is a directed multigraph, constructed from the initial data graph G 0 and guided by a matching order of query graph. Each vertex of FG denotes a candidate of one query vertex, which is taken as the label of candidate. Each edge of FG corresponds to the relationships of nodes in the matching tree of query graph. e flow graph index can effectively reduce the scale of original data graph. (2) We design an adjacent index to accelerate the identification of candidate activation states on attributed knowledge graph. e three benefits are discovered from our adjacent index. e first benefit is that the adjacent index can improve the time-efficiency of comparison of the inclusion relationships of node pair. e second benefit is that the adjacent index can quickly verify the transformed state of seed candidate as graph update stream is incrementally inserted. e third benefit is that the adjacent index can quickly search the adjacent candidate region.
(3) We propose a sequential subdivision technology of the flow graph to limit the search derivation of graph update stream. e sequential numbers of root candidates are assigned to the vertices of subdivided flow graphs and limit the search space of originating changed candidate of FG. (4) We design a state transition model to describe the transition states of changed candidates, which consists of three states and six transition rules. Based on the state transition model, we analyze the influence of changed candidates to the adjacent region and design our incremental maintenance strategy. (5) We design an incremental subgraph matching algorithm based on the sequential subdivided flow graph. e consistency of subgraph matching is guaranteed by two verifications of selected candidates, relational verification and sequential verification. e relational and sequential verifications are used to verify the local isomorphism and the equivalence of sequential numbers between one local subgraph mapping and selected candidates, respectively. e isomorphic subgraphs are incrementally and effectively conducted with the aid of relational and sequential verifications.
Extensive empirical studies on real and synthetic graphs demonstrate that our techniques outperform the state-of-the-art algorithms. e rest of this paper is organized as follows. Section 2 introduces the preliminaries about problem definitions and related works. Section 3 provides a flow graph index of knowledge graph, including the definition and construction of the flow graph index. Section 4 presents an incremental subgraph matching on the flow graph, including the sequential division technology, incremental maintenance, and incremental subgraph matching on graph update stream. Experimental results are reported in Section 5. A conclusion is given in Section 6.

Preliminaries
In this section, the definitions of knowledge graph and subgraph matching are first given. en, the related research studies are introduced.

Problem definition.
Knowledge graph KG is a directed labeled multigraph having multiple edges between a pair of vertices. e labels of KG are extracted from RDF information. Resource Description Framework (RDF) [17] is a standard semantic model designed by W3C group 2, which is represented by a set of triples 〈S, P, O〉. Each triple 〈s, p, o〉 consists of three components: a subject, a predicate, and an object. Furthermore, a triple 〈s, p, o〉 is formed as I × I × IL, where I denotes an IRI (Internationalized Resource Identifier) and L represents a literal.
rough the extension of RDF triple model with timestamp, the model can be used to represent RDF stream, denoted as (〈s, p, o〉: t) [18]. Here, 〈s, p, o〉 is an RDF triple and t is a timestamp. e labels of KG are classified as instance-label, relationlabel, attribute-label, and type-label according to the resource and inter-resource relationship of RDF data. Considering an RDF triple 〈s, p, o〉, o is named as type-label if and only if both s and o are IRIs and p is a typed predicate, e.g., rdf:type and rdf: subclassof. s and o are called instance-label and p is named as relation-label if and only if both s and o are IRIs and p is not a typed predicate, and p is called as attribute-label if and only if o is a literal.
Definition 1 (knowledge graph). A knowledge graph is a directed labeled multigraph, formed as G(V, E, L). Here, V is Complexity a set of vertices, E ⊆ V × V is a set of directed edges, L(� L V ∪ L E ) is a labeling function, L V assigns type-labels and attribute-labels to vertices, and L E assigns relation-labels to edges. In a KG, each vertex can be assigned by multiple labels and each edge also can be assigned by multiple labels. Considering a vertex v 1 and an edge (v 2 , v 1 ) of KG in Figure 3(b), type-labels or attribute-labels A and B are assigned to v 1 and relation-labels a and e are assigned to (v 2 , v 1 ) and relation-label a is assigned to (v 1 , v 2 ). It can be found that KG has a denser intervertex relation than the general graph.

Subgraph Matching.
e problem of subgraph matching is to search all possible subgraphs of data graph G that are isomorphic to query graph q. e subgraph matching is formally defined as a problem of subgraph isomorphism, described in Definition 2.
Definition 2 (subgraph isomorphism). Given a data graph G(V, E, L) and a query graph q(V q , E q , L), q is subgraph isomorphic to G if and only if there exists a bijective mapping M from V q to V such that ∀u ∈ V q , ∃M(u) ∈ V: A query graph q is subgraph isomorphic to a data graph G if there exists a subgraph isomorphic mapping (subgraph mapping for short) of q on G. Simply, considering the labeled query graph and data graph in Figures 3(a) and 3(b), respectively, A, B, C { } is a set of vertex-labels. q is subgraph isomorphic to G since there exist subgraph isomorphic mappings M 1 〈v 1 , u 1 〉, 〈v 2 , u 2 〉, 〈v 2 , u 3 〉, 〈v 4 , u 4 〉 and M 2 〈v 2 , u 1 〉, 〈v 1 , u 2 〉, 〈v 4 , u 3 〉, 〈v 3 , u 4 〉 .
Similar to the subgraph isomorphism on the static graph, the subgraph isomorphism on the dynamic graph is extended with a graph update stream. e problem definition of continuous subgraph matching problem is denoted in Definition 3.
Definition 3 (KG-CSM). Given a query multigraph q, an initial data multigraph G 0 , and a multigraph update stream △G i , the continuous subgraph matching problem identifies all positive/negative subgraph mappings for each update edge in △G i .
In this paper, our research of KG-CSM problem focuses on constructing an effective lightweight index to arrange the search space of seed candidates. en, a region-limited technology is used to constrain the derivation of search spatial scale of graph update stream.
In this paper, we focus on a directed labeled graph G(V, E, L). Here, V is a set of vertices, E ⊆ V × V is a set of edges, and L (� L V ∪ L E ) is a labeling function which assign a label or multiple labels to vertex and edge. Both q and G are directed labeled graphs, and the directed or undirected edges cannot affect the execution scheduling of subgraph matching. e detailed notations and meanings are described in Table 1.

Related Works.
In this section, we mainly review the related works on index and subgraph matching algorithms of knowledge graph and general graph and then outline their limitations.

Subgraph Matching of Knowledge Graph.
e storage structure of RDF data should be introduced before discussing the index of the knowledge graph. e knowledge graph is modeled by Resource Description Framework (RDF), which is a standard semantic data model designed by W3C group. e storage structure of RDF data generally accepted by research studies mainly includes relational store. e relational store is involved into many systems, i.e., SWstore [9], Sesame [10], Jena [19], RDF-3X [20], etc. e relational approach can be classified as vertical representation and horizontal representation.
In the vertical representation approach, RDF data is conceptually stored in a single table over the relational schema. Due to the large size of RDF data and the potential large number of self-joins required to answer queries, it must be taken to devise an efficient physical layout with suitable indexes to support query answering. However, there are a mount of overlapped copies of RDF triples. For avoiding storing multiple copies of RDF triples, there are many triple stores [21,22] that aggressively store the triple table in multiple sorted orders. A clustered B-tree is constructed and the desired triple ordering is available in the leaves of B-Tree. However, the approaches of B-Tree are more demanding in terms of storage space because the effective query answering should support the availability of various sorted recording for fast merge joins.
In the horizontal representation approach, RDF data is stored in one or more wide tables by interpreting predicate as column name. To minimize the storage overhead caused by empty cells, property table approaches [23,24] were proposed and concentrated on dividing the wide table in multiple smaller tables containing related predicates. However, this approach actually creates many small tables, which are harmful on query evaluation performance. Specially, a vertically partitioned approach was proposed and the decomposition was taken to its extreme. Both the issues of empty cell and the multiple objects are solved at the same time. Abadi et al. [9] and Sidirourgos et al. [25] noted that the performance of this approach is best when sorting the binary tables lexicographically to allow fast joins.

Subgraph Matching on RDF Stream.
Most studies on index and pattern matching of streaming RDF data are designed to leverage existing solutions for nonstreaming RDF data. e standardization of streaming RDF data is still an ongoing debate, and W3C RSP community group3 is an important initiative. ese studies utilized various custom query languages that are extended from SPARQL to answer the queries. Additionally, these studies still followed a relational approach to storing and indexing RDF data.
For instance, C-SPARQL [3] used the underlying Jena architecture to store and index triples within property tables, whereas CQELS employed index-based solutions by storing triples directly in asc-trees over multiple redundant permutations and used Eddy operators and query optimizations to index triples. SparkWave [4] used the RETE network to determine the set of triggers to fire when a new triple arrives, and it materialized intermediate results to reduce the amount of work that is required for each update. ose systems stored RDF data in relational tables and process queries using relational operators, such as scan and join operators.
However, the relational stores need too many join operations for evaluating the queries, especially those queries having complex and large graph patterns. Meanwhile, the index-based relational approaches represent progress towards the more dynamic environments by allowing continuous monitoring and periodically evaluating the index design. e majority of these approaches employed the reevaluation strategies for optimizing execution plan, which requires an incremental indexing technique to maintain intermediate results automatically and incrementally. e majority of RDF Stream Processing (RSP) systems are based on a recalculated model. e recalculation of matches can result in unnecessary utilization of computational resources once the data are updated within a window. For instance, Eddy operators [26] that were employed by CQELS [18] resulted in expensive computations and continuous usage of resources to explore all plans, thus requiring a fully pipelined execution for RDF streams.
Furthermore, caching the statistical measure of triples and choosing the correct order for every triple update causes considerable overhead.
A prominent contribution for RDF stream is the study entitled SPECTRA [27], in which a set of vertically partitioned views was used to collect the summarized data from each event and sibling lists were employed to incrementally index the joined triples between views. e matched results were shown in a set of final views, thus enabling an incremental evaluation with the arrival of new events. Although the combination of RSP and incremental algorithms improve the execution time efficiency for streaming RDF data, however, the relational approach with a higher joined depth and a greater focus on independent events was presented by SPECTRA, which provides a motivation for our study. We argue that the incremental evaluations can greatly reduce the computation tasks and improve the execution performance.

Subgraph Matching on General Graph.
e problem of pattern matching for RDF graph is similar as the problem of subgraph isomorphism for the general graph. Ullmann [28] proposed a backtracking algorithm that significantly reduces the size of search space. VF2 [29] was a well-known state-of-the-art algorithm, which proposed a state space representation to deal with different exact graph matching problem: each state is a partial mapping between two given graphs, while goal states are  Meanings A directed query multigraph with vertex set V q , edge set E q , and labeling function L G �(V, E, L) A directed data multigraph with Vertex set V, edge set E, and labeling function L A candidate region of u and adjacent to v, u ∈ V q , v ∈ V complete mappings consistent with the problem constraints. Hence, the search space was explored through a depth-first strategy with backtracking, which is driven by a set of feasibility rules to prune unfruitful search paths. SPath [30] implemented path-at-a-time pattern during the searching process. It decomposed the query graph into several paths and found the embeddings of each path which would be joined later. TurboISO [31] and BoostISO [32] tried to find greater matching order to make subgraph matching more efficient, where the graph-compressed method is implemented to reduce the space complexity. For reducing the duplicate adjacent candidates, a compact path index was proposed by CFLMatch [33]. e compact path index is a multipath index induced by a spanning tree of query graph, and it is composed of multiple clusters and intercluster relations. Each cluster collects the candidates of one query vertex. A data-centric path index was proposed by TurboFlux [13], which can further eliminate the storage of duplicate candidates. e data-centric path index attaches query vertices as a label set to data vertices. TurboFlux employed a data-centric path index to accelerate the continuous subgraph matching on dynamic graph. However, the algorithms of subgraph isomorphism hardly migrate to the problem of patterning matching on RDF graph, which was proved by [34], due to the unsymmetrical structural characteristic of RDF graph essentially. e indexing and machine learning technologies employ semantic and structural characteristics to enhance the semantic equivalence of KG. S 2 R-tree [35] was a pivot-based hierarchical indexing structure to integrate spatial and semantic information in a seamless way, which used a space mechanism to transform the high-dimensional semantic vectors to a lowdimensional space. A predictive model of future star was proposed by FS-ELM [36], which studied a rising star evaluation by exploiting social topology characteristics and user behavior patterns in geo-social networks. MTLM [37] proposed a multitask learning model for traversal time estimation, which first recommended the appropriate transportation mode for users and then estimated the related traversal time of path in tree pattern. RQL [38] designed a reinforcement learningbased algorithm for the dynamic bipartite graph matching problem, which made near-optimal decisions on batch splitting with a constant competitive ratio. Gao et al. [39] proposed a novel framework to achieve the privacy-preserving subgraph pattern matching in cloud. e framework used a label-generated privacy model to protect and label the potential privacy in both data graphs and pattern graphs.

Subgraph Matching on Dynamic Graph.
A dynamic graph is modeled as a graph, whose edges are activated by sequences of time-dependent elements. Wang et al. [6] discussed the definition and topological structure of timedependent graphs, as well as models for their relationship to dynamic systems. In addition, they reviewed some classic problems on time-dependent graphs and studied the weightconstrained route planning problem over a large time-dependent graph coupled with continuous time and weight functions [40]. Choudhury et al. [41] provided a subgraph selectivity approach to determine subgraph search strategies and used a subgraph tree structure to decompose the query graph into smaller subgraphs, which are responsible for storing partial results. However, retaining and querying thousands of edges within a large window requires considerable amount of space and computational resources. Moreover, this approach only supports simple path-based queries, and it is optimized for homogeneous graphs using an edge stream model. Fan et al. [12] presented algorithms for graph pattern matching over evolving graphs by employing a repeated search strategy to calculate matches until a fixed point reached with each graph is updated and removed. However, the repeated search strategy can enlarge the time consumption of subgraph matching. e more related technologies employed semantic and structural characteristics to improve the performance of dynamic problem. INC-GPM [42] built an index to incrementally record the shortest path length range between different label types and then identified the affected parts of graph update stream. DCSGR [43] exploited the connections between group users in community detection and proposed an aggregation function to integrate the recommended media lists of all interest subgroups as the final group recommendation results.

Flow Graph Index of Knowledge Graph
In this section, a flow graph index (FG) of the knowledge graph is proposed to arrange the search space of seed candidates. Before the introduction of FG, our solution for KG-CSM problem is first given to clarify the core role of FG in our algorithm (Algorithm 1).
A pseudocode of continuous subgraph matching is described in Algorithm 1, named as incremental pattern matching algorithm (iPM). A matching tree orchestrates a matching order to iteratively conduct subgraph mappings (Line 1). In this paper, we employ the matching order generated by a depth-first traversal without considering the calculated paradigms of near-optimal matching order because we are committed to the incremental calculated paradigms of graph update stream. e flow graph index FG of the knowledge graph is constructed by sequential and mapping relationships of q T on G 0 (Line 3 and Section 3.2) and incrementally maintained by a graph update stream △G i (Line 5 and Section 4.1). en, all subgraph mappings covering △G i are directly conducted by the iterative traversal on FG (Line 6 and Section 4.2). e core role of FG in our algorithm consists of three parts, described as follows. e first part is the initial construction of FG, defined as △FG 0 , which is guided by a matching tree q T on the initial data graph G 0 . e second part is the maintenance of △FG 0 adapting to graph update stream △G i , defined as △FG i . e third part is the incremental matching of △FG i in the adaptive matching order △q T . us, FG is the core role of our designed approach, introduced in Section 3.1.

Data Index of Knowledge Graph.
Knowledge graph is a directed labeled multigraph, which is the combination of complex topological and attributed structures. e data 6 Complexity index of knowledge graph is composed of flow graph index and adjacent index. e flow graph index is constructed from the topological structure of knowledge graph, which is used to arrange the search space of seed candidates.

Flow Graph Index of Topological Knowledge Graph.
e flow graph index is defined as flow graph (FG), which is designed to arrange the binary relationships between a pair of data vertices in G. e binary relationship of FG follows the parent-child relationship of spanning tree of q. We divide the edges of q into tree edge and nontree edge according to the parent-child relationship of spanning tree of q. A spanning tree containing both tree edges and nontree edges is called as matching tree, formed as q T . Considering a query graph in Figure 2(a), a matching tree is described in Figure 4(a), which is ordered by a depth-first traversal on q.
e solid line denotes the tree edge and dotted line indicates the nontree edge. Regarding a tree edge (u 1 , u 2 ), u 1 is a parent of u 2 , described as u 2 .p � u 1 . A flow graph is constructed in the guide of matching tree, described in Definition 4.

Definition 4 (flow graph). A flow graph is a directed labeled multigraph, formed as FGq
Here, V F is a set of vertices, E F is a set of edges, and L F is a labeling function that assigns one or multiple labels to vertices.
Here, each vertex of V F refers to a query-data vertex pair (node pair for short) of q on G. Regarding a node pair 〈v, u〉, satisfying v ∈ V and u ∈ V q , then v is a vertex labeled by u in FG. Each edge of E F indicates the tree edge or the nontree edge similar as the matching tree. Regarding node pairs 〈v, u〉 and 〈v ′ , u ′ 〉, satisfying u.p � u ′ and v is a neighbor of v ′ , then v is a parent of v ′ , formed as v.p � v ′ . Considering the data graph in Figure 2(b) and the matching tree in Figure 4(a), a flow graph is described in Figure 4(b). Regarding vertices v 1 and v 4 , which are labeled by u 1 and u 2 , respectively, satisfying v 4 .p � v 1 , because u 2 .p � u 1 and v 4 is a neighbor of v 1 . e unconstrained quantity of node pairs may cause a huge space scale of vertices in FG. Considering a query graph q of size n and a data graph G of size m, the quantity of node pairs are calculated as n × m. e two strategies are used to solving the unconstrained quantity of node pairs. One strategy is to employ a labeling function that assigns multiple labels to vertices, which avoids the repeated storage of vertices in FG. Another strategy is to design the constraint rules of node pairs. e constraint rules are denoted in the definition of candidate verification, as described in Definition 5.
Definition 5 (candidate verification). Given a node pair 〈v, u〉, data vertex v is the candidate of query vertex u if and only if it satisfies the following constraints: Here, L V and L E denote the labeling functions of vertex and edge, respectively. e constraints of candidate verification can effectively reduce the scale of node pairs. A node pair is deleted if it does not satisfy constraint (1). Furthermore, we divide the node pair as positive and negative node pair according the relax and strict constraints. Considering a node pair np:〈v, u〉, np is a negative node pair if and only if it satisfies constraint (1), and np is a positive node pair if and only if it satisfies constraints (1), (2), and (3). Considering the flow graph in Figure 4(b), solid cycle denotes the positive candidate and dotted cycle indicates the negative candidate. e negative candidate may be changed as a positive one when graph update stream is inserted into FG. To intuitively express the transformed state, a node pair state is defined to denoting the active and silent states of node pairs, formed as State (〈v, u〉). Node pair np:〈v, u〉 satisfies the relax and strict constraints, formed as State (np) � 0 and State (np) � 1, respectively, then np is encapsulated into a labeled vertex in FG, otherwise it is pruned.
A node pair state is composed of candidate state CS and following state FS, denoted as State (np) � CS (np) ∧ FS (np). e candidate state CS describes the negative and positive node pairs. A node pair np is positive if CS (np) � 1. e following state FS is used to describe the candidate states of followers. We define the descendants of query vertex u as Des (u). Given a node pair np: 〈v, u〉, a candidate v′ of Des (u) is the follower of np if it is reachable from v, then v is named as the dominator of u. Regarding a node pair np: 〈v, u〉, FS (np) � 1 if it satisfies the condition ∀u ′ ∈ Des (u), ∃v ′ ∈ Des (np): CS (〈v ′ , u ′ 〉) � 1. e node pair state, candidate, and following states of node pair np:〈v, u〉 can be abbreviated as State (v), CS (v), and FS (v), which are denoted by the common query vertex u.

Adjacent Index of Attributed Knowledge Graph.
In this paper, we focus on the problem of continuous subgraph matching on a special knowledge graph. Knowledge graph (KG) is a directed labeled multigraph having multiple edges between a pair of vertices. e labels of knowledge graph can be classified as type label and attribute label. Actually, the vertex of KG can be coupled with one or multiple labels.
To deal with the challenge of multigraph characteristic, the adjacent indexes of query and data vertices are proposed to accelerating the time-efficiency of candidate verification between initial data graph and graph update stream. Considering query and data multigraphs in Figure 3, the adjacent indexes of query vertex u 1 and data vertex v 2 are described in Table 2. Here, AL, OEL, and IEL denote the labels of neighbors, inner edge, and outer edge of query or data vertex, respectively. e first benefit is that adjacent indexes can improve the calculated time efficiency of the inclusion relationships of node pair. Considering the common adjacent label B of u 1 and v 1 , the counts can be used to quickly calculate the inclusion relationships of u 1  e third benefit is that adjacent indexes can quickly search the adjacent candidate region.
rough the verification of common adjacent label B of u 1 and v 2 , it can be found that v 1 and v 4 are the negative candidates of u 2 , that satisfy constraint (1) in Definition 5. Furthermore, the final adjacent candidate region can be reduced through the intersection operation of common neighbor of different unique labels.

Time and Space Complexity of Knowledge Graph Index.
e data index of knowledge graph is composed of flow graph index and adjacent index. e time and space of flow graph index is described as follows.
e worst-case space-complexity of FG is O(2|V| + (2 + |V q |) · |V| · (|V| − 1)). e first reason is that the size of vertices in FG is at most |V| when each vertex is the candidate data of one query vertex. e second reason is that the size of edges in FG is at most 2 · |V| · (|V| − 1), when each vertex pair has an edge relationship coupled with two tree edges and two nontree edges. e third reason is that the size of edge labels in FG is at most |V q | · |V| · (|V| − 1) when each tree-edge is assigned by all query vertices. Actually, the tree edge-labels can be encoded by a |V q |-bit string. e worst-case time complexity is O(|V q | + |E| − 1) about insertion and deletion of one vertex on FG. Regarding Input: a query graph q, an initial data graph G 0 and a graph update stream △G i Output: the set M of all subgraph mappings of q in G 0 ⊕△G i (1) q T ← q T -Generation (q); Outer edge labels index of u 1 Outer edge label index of v 2 if v has the worst-case time complexity, it should satisfy that v is the common candidate of all query vertices in q, v is connected with all other vertices in V F , and |V F | � |V|. e time and space of the adjacent index is described as follows. e space complexity is equivalent to the one of doubly linked list. e worst-case time complexity is O(|E| − 1) about the insertion and deletion of one vertex on FG if the vertex is connected with all other vertices in V F and |V F | � |V|.
In this paper, our research of KG-CSM problem focuses on constructing an effective lightweight index to search space of seed candidates. rough the analysis of time and space complexities, our knowledge graph index is a linear consumption and it is beneficial to indexing the single lagerscale data graph.

Construction of Flow Graph.
In this section, our construction algorithm of the flow graph is introduced in Algorithm 2. e inputs are an initial data graph G 0 , a query graph q, and its matching tree q T . e output is the subgraph index of flow graph △FG 0 . A matching tree q T orchestrates a matching order (u 0 , u 1 , · · ·, u |V T |−1 ) and the parent-child relationships of query vertices in q. e construction algorithm of the flow graph contains three modules (Algorithm 2). e first module is used to verify the candidate state of node pair with the aid of adjacent indexes (Lines 3-6). e following and candidate states of node pair are initialized as 0 and −1, respectively. Regarding a node pair np: 〈v, u〉, v is negative candidate of u if np satisfies constraint (1) in Definition 5, then the candidate state of np is marked as 0 (Line 4). If np satisfies constraints (1), (2), and (3) in Definition 5, v is positive candidate of u, then the candidate state of np is marked as 1 (Line 5). Considering a node pair np: 〈v, u〉, satisfying u is a leaf node in q T and v is positive candidate of u, then the following state of np is marked as 1 (Line 6) because there is not a descendant can be included into leaf nodes. All negative and positive candidates are added into candidate set and vertex set V F (Lines 4-5), and C(u) denotes a set of candidates of query vertex u. e second module is used to verify the node pair state through the calculation of following state in bottom-up matching order (Lines 7-14). Regarding node pairs 〈v, u〉 and 〈v ′ , u ′ 〉, satisfying u ′ .p � u and v, v ′ are the positive of u and u ′ , respectively, then v. e third module is used to insert the edges to FG in topdown matching order. Considering node pairs 〈v, u〉 and 〈v ′ , u ′ 〉, satisfying u.p � u ′ and v is a candidate of u, then (v, v ′ ) is a tree edge and inserted into FG. Otherwise, (v ′ , v) is a nontree edge and inserted into FG. e function (v ′ , v) ⟶ SN min E F denotes that (v ′ , v) is inserted into E F and the minimum sequential number of v ′ is assigned to v, which is described clearly in Section 4. e tree edge and nontree edge are used to distinguish the operations of node pair in continuous subgraph matching. Figure 4, an example of FG construction algorithm is described in Table 3. e matching order of q T is orchestrated as a sequence of query vertices u 1 , u 2 , u 3 , u 4 , and u 5 and the query vertex is marked as 1 if it is visited. In the first module of Algorithm 2, the following and candidate states of node pair are initialized as 0 and -1, respectively. Since 〈v 7 , u 2 〉 is a negative node pair, CS (〈v 7 , u 2 〉) is marked as 0. e candidate states of other node pairs are marked as 1. In the second module, the node pair state is verified through the calculation of candidate and following states in the bottom-up matching order. Regarding node pair 〈v 7 , u 2 〉, satisfying u 3 .p � u 2 , it cannot find a candidate of u 4 that is adjacent to v 7 , then the following state FS (〈v 7 , u 2 〉) is marked as 0. In the third module, the edges are inserted into FG and minimum sequential numbers of parent-nodes are assigned to child-nodes in top-down matching order. Regarding node pair 〈v 11 , u 5 〉, satisfying u 5 .p � u 4 and C(u 4 ) � v 5 , v 6 , v 9 , the minimum numbers of v 5 , v 6 , and v 9 are transformed to v 11 , thus the sequential number of v 11 are 1, 2, and 3. e detailed description of sequential number is described in Section 4.

Incremental Subgraph Matching on Flow Graph Index
In this section, a sequential subdivision technology of flow graph is first given to limit the search derivation of graph update stream. en, the strategies of incremental subgraph matching and incremental maintenance are proposed based on the divided flow graph.

Sequential Subdivision of Flow Graph
Index. e sequential subdivision of flow graph divides a flow graph into multiple flow subgraphs and sequentially encodes the vertices of flow subgraphs. e flow graph is divided on the basic of candidates of root node in q T , described as FG (v) and v ∈ C(u 0 ). Here, u 0 is an originating node of query vertices in matching order q T and v is named as root candidate of root node u 0 in q T . Considering the flow graph in Figure 4(b), the divided flow subgraphs are denoted in Figure 5(a), described as FG (v 1 ), FG (v 2 ), and FG (v 3 ).
All subgraph mappings can be conducted on the traversal of flow subgraphs, defined in eorem 1.

Theorem 1. All subgraph mappings of q on G must be included into one flow subgraph.
Proof. For eorem 1, regarding a subgraph mapping M � 〈v 1 , u 1 〉, 〈v 2 , u 2 〉, · · · , 〈v n , u n 〉 , it must be found in a flow subgraph FG (v 1 ).
In the subdivision of flow graph, the root candidates of FG are sequentially arranged to identify the relationship of flow subgraphs through a unique encoding technology. e unique encoding of vertices can effectively avoid the Complexity redundant allocation of common vertices of multiple flow subgraphs. Considering the flow graph in Figure 4(b), the sequential encoding on the flow graph is described in Figure 5(a). e sequential number of root node pair is passed and copied to all its negative and positive followers. If a node pair is the common follower of multiple root node pair, only the node pair is marked as the numbers of multiple root node pairs and its followers are not marked repeatedly. Regarding the follower v 12 of vertex v 7 in Figure 5(b), v 12 is not be marked repeatedly in FG (v 3 ).
A phenomenon of sequential flow subgraph is founded to effectively limit the search derivation of graph update stream, as described in Lemma 1. Proof. For Lemma 1, considering a subgraph mapping 〈v 0 , u 0 〉, 〈v 1 , u 1 〉, · · · , 〈v n , u n 〉 originating from candidate v 0 of query vertex u 0 , if exists a descendant u ′ of u 0 , such that it cannot find a positive candidate of u ′ , then a subgraph mapping cannot be conducted by node pair of u ′ .
Benefit from sequential subdivision of FG, the first aspect is that it can avoid the repeated encoding of vertices in the following region. Regarding the sequential flow subgraphs FG (v 2 ) and FG (v 3 ), the follower v 12 does not to be encoded in FG (v 3 ) because v 12 is a follower of v 7 that has been encoded with a new number 3. e second aspect is that it can previously verify the common flow subgraphs of inserted edges. Regarding an inserted edge (v 6 , v 4 ) in Figure 5(a), the incremental subgraph matching of (v 6 , v 4 ) does not need to be executed because v 6 and v 4 are included into different flow subgraphs, where v 6 is located into FG (v 2 ) and FG (v 3 ) and v 4 is located into FG (v 1 ). e Input: a matching tree q T , a query graph q and a data graph G Output: the flow graph △FG 0 (6) if PosCandVerify (vu) and u is leaf then v. State (u)� 1 (7) ReSet V q as unvisited, u |V T |−1 as visited; (8) for u ∈ V q and u � u i from i � |V T | − 1 to 0 do (9) for u′ ∈ N(u) and u′ is visited do (10) for v ∈ N(v′) and v′ ∈ C(u′) do (11) if v.CS(u) � 1 and u′.p � u then (12) v.FS (u) � v′.State (u′) ∨v.FS (u) (13) v.State (u) � v.CS (u) ∧v.FS (u) (14) Mark u as visited; (15) Set V q as unvisited, u 0 as visited; (16) for u ∈ V q and u � u i from i � 0 to |V T | − 1 do (17) for u′ ∈ N(u) and u′ is visited do (18) for v ∈ N(v′) and v′ ∈ C(u′) do (19) if v.CS(u) ≠ − 1 and u.p � u′ then (v′, v) ⟶ SN min E F (20) if v.CS(u) ≠ − 1 and u.p ≠ u′ then (v, v′) ⟶E F ; (21) Mark u as visited; (22) return △FG 0 ALGORITHM 2: FG construction algorithm.   Proof. For Lemma 2, according to eorem 1, all subgraph mappings of q on G must be included into one of flow subgraphs. Since a flow subgraph is composed of a root candidate and it followers, a flow subgraph at least contains one dominator, that is, root candidate. us, vertices v and v ′ are not included into a subgraph mapping if v and v ′ have not a common domination vertex. e third aspect is that the deleted edges can block the dominating relationships of flow subgraphs. Regarding a deleted vertex v 4 in Figure 4(a), it cannot find a subgraph mapping consisting of v 5 after vertex v 4 is deleted. e influence of deleted vertex on dominating relationship is described in Lemma 3.

Incremental Maintenance.
e incremental maintenance of FG employs a state transition model to effectively identify the influence of update candidate state on subgraph index of the flow graph and contribute to the incremental subgraph matching on the flow graph. e state transition model consists of three candidate states (−1, 0, 1) and six transition rules (Transitions 1-6), which demonstrates the adjacent influence of changed candidate from one state to another one. e vertices of states 0 and 1 indicate the negative and positive candidates respectively that are included into previous FG. A vertex of state −1 denotes an inserted negative or positive candidate, which is not included into previous FG. e six transition rules describe the state changing of candidates as vertices are inserted and deleted in the graph update stream. Figure 6 Furthermore, we analyze the influence of six transition rules on the adjacent region of changed candidate state in FG. In order to reflect the impact of changed candidate state on structural characteristic of our flow 3 graph, we divide the adjacent region of changed candidate into three subregions: parent, child, and nontree subregions. e changed candidate may lead to the state transitions of vertices in three subregions through Transitions 2 and 5. e three subregions of changed vertex v 7 are described in Figure 6(b) and formed as v 7 .p � v 2 , v 3 , v 7 .c � v 11 , and v 7 .nt � v 6 , v 9 . e parent, child, and nontree subregions of v 7 are filled with blue, yellow, and purple colors, respectively. e influence of changed candidate to the vertices in parent subregion is described as follows: (1) For the changed vertex v by Transitions 2, 3, 5, and 6, regrading node pairs 〈v, u〉 and 〈v ′ , u ′ 〉, satisfying u.p � u ′ , then v is a follower of v ′ . us, the state changing of v may reverse the following state is also may be reversed by the state changing of v.
(2) For the changed vertex v by Transitions 1 and 3, considering node pairs 〈v, u〉 and 〈v ′ , u ′ 〉, satisfying u.p � u ′ and v ′ is marked by a minimum sequential number min of root candidate, min is copied to v.
Regarding an active candidate v 12 by transition rule 2 in Figures 5(b) and 5(c), the State (v 7 ) is true because its follower v 12 is a positive candidate. Regarding an active candidate v 12 by transition rule 1 in Figures 5(a) and 5(b), v 12 is marked by a minimum sequential number 2. e influence of changed candidate to the vertices in child subregion is described as follows: (1) For the changed vertex v ′ by Transitions 4 and 6, regarding node pairs 〈v, u〉 and 〈v ′ , u ′ 〉, satisfying u.p � u ′ , then the sequential number of v may be deleted through Lemma 3 because v ′ may block the dominating relationships of FG.
An incremental maintenance algorithm is described in Algorithm 3. A previous processing first merges graph update stream △G i to the flow graph FG of initial graph G 0 . en, the changed vertices of △G i are analyzed by transition rules 1-6. e incremental maintenance algorithm is to search all changed vertices affected by state transition rules until there is no candidate transition in FG (Algorithm 3).

Incremental Subgraph Matching.
In this section, the incremental subgraph matching is given to conduct the subgraph mappings of q on the initial graph G 0 and graph update stream △G i .
Two matching order is designed to orchestrate the traversal sequence of node pairs in the initial graph G 0 and the graph update stream △G i . e matching order q T of initial graph is used to orchestrate the traversal order of flow graph construction and matching sequence of subgraph mappings. e matching order q T is fixed in the subgraph matching of initial graph, and the originating nodes of q T are the candidates of root node in q T . e matching order △q T of graph update stream is used to orchestrate matching sequence of subgraph mappings on the incremental maintenance of graph update stream. e matching order △q T is changed by each active candidate in the subgraph matching of graph update stream. Given an active node pair 〈v, u 2 〉, it first traces the root candidate of q T in backward order and then traverses the nodes of other paths in forward order. Regarding an active node pair 〈v, u 2 〉 by transition rules 2 and 3, △q T � u 2 , u 1 , u 4 , u 5 , u 3 in Figure 4(a) and u 1 is the root of q T . e consistency of subgraph matching is guaranteed by two verifications of inserted node pairs, relational verification and sequential verification. e relational verification is to verify the local isomorphism of local subgraph mapping and selected node pair, as described in Verification 1.
Verification 1 (relational consistency). Given a partial subgraph mapping M i−1 and a selected node pair 〈v i , u i 〉, data vertex v i is the relational consistency with query vertex u i if and only if it satisfies the following constraint: Here, State (〈v i , u i 〉) denotes the state of node pair 〈v i , u i 〉. State (〈v i , u i 〉) is true if and only if CS (〈v i , u i 〉) is true and ∀u ′ ∈ Des (u i ), ∃v ′ ∈ F (v i ): CS (〈v ′ , u ′ 〉) � 1, according to Definition 5 and Lemma 1, that is, State (〈v i , u i 〉) � CS (〈v i , u i 〉) ∧ FS (〈v i , u i 〉). A subgraph mapping is composed of multiple node pairs, formed as M and the number of node pairs is equivalent to the number of query vertices, denoted as |M| � |V q |. Given a subgraph mapping M � 〈v 1 , u 1 〉, 〈v 2 , u 2 〉, . . . , 〈v n , u n 〉 , a partial subgraph mapping is a subset of sequential node pairs, defined as M i � 〈v 1 , u 1 〉, 〈v 2 , u 2 〉, . . . , 〈v i , u i 〉 , i ≤ n and M i ⊆M.
Here, v i denotes a sequential vertex in matching tree q T . e sequential verification is to verify the equivalence of sequential numbers between local subgraph mapping and selected node pair, as described in Verification 2.
Verification 2 (sequential consistency). Given a partial subgraph mapping M i−1 and a selected node pair 〈v i , u i 〉, data vertex v i is the sequential consistency with query vertex u i if and only if it satisfies the following constraint: Here, SN (v i ) refers to the sequential numbers of v i and sequential numbers are transitively assigned by the number of root candidate. Regarding the sequential number 2 of v 11 in Figure 5(b), there is a reachable path from v 2 to v 11 .
Given vertices v and v ′ of FG if v and v ′ are the followers of different common dominators, then v and v ′ cannot conduct the subgraph mappings according to Lemma 2. Regarding sequential number 1 of v 10 and sequential numbers 2 and 3 of v 7 in Figure 5(b), it cannot find a subgraph mapping conducted by v 7 and v 10 because v 7 and v 10 are located into different sequential flow subgraphs FG (v 1 ) and FG (v 2 , v 3 ) (Algorithm 4).
An incremental subgraph matching is described in Algorithm 4. e inputs are a merged flow graph △FG i and a matching tree △q T . e outputs are the subgraph mapping M of △q T on △FG. e subgraph mappings are iteratively conducted if and only if i � |V T | (Lines 1-2).
One module is to iteratively conduct all subgraph mappings of initial graph (Lines 5-7). Another module is to iteratively conduct all subgraph mappings of graph update stream (Lines [11][12][13][14]. SNVaild (v, u) and RCVaild (v, u) are used to verify the relational and sequential consistencies of selected node pairs (Lines 6 and 12). e selected node pairs are inserted into the subgraph mapping M if the verifications of selected node pairs are valid (Lines 7 and 13). FG (u.successor) and △FG (u.successor) are used to acquire the successor of u in q T and △q T for traversing the node pairs of query vertices in forward order. FG (u.precursor) and △FG (u.precursor) are used to backtrack the precursor of u in the backward order of q T and △q T , respectively.

Experimental Evaluation
We conduct extensive performance studies to evaluate our incremental subgraph matching (iPM). All the experiments are preformed on an Intel Xeon E7520 processor with 12 MB of L3 cache. e system is equipped with 32 GB of main memory and it runs a 64 bit Linux 3.13.0 kernel.

Experimental Settings.
e performance evaluation of algorithm mainly depends on two aspects, query graph and sliding window. e influencing factors of algorithm include Query Factor (QF), Query Shape Factor (QSF), and Data Window Factor (DWF).
Since flow graph index employs the structural feature and semantic label of query graph to pruning the noncandidates of the data graph, it is closely related to the size of the flow graph index. e size of the query graph is denoted as Query factor QF. e factor of query shape QSF refers to the shape of query graph, such as chain, star, cyclic and chain-star shapes [44], as described in Figure 7. e QSF is closely related to the density of adjacent structure. Considering the chain and star queries, the adjacency structure of star-shape query is more dense than the chain-shape query.
en, the star-shape query can prune the more noncandidates than chain-shape query because the query within denser structural feature can prune the more noncandidates than simple one in the original data graph.
Data Window Factor DWF illustrates the influence of sliding window on conducting subgraph mappings. Sliding window is associated with the changed size of graph update stream locked in the quantified size of initial data graph. e impact factors of iPM are described in Table 4. e initial data graph contains the RDF data of size 1.0×10 4 , which is encapsulated into the quantified window in the initial traversal processing. e query graphs of different shapes (star, chain, and cycle) and scales (1-22 triple patterns) are used to evaluate the influence of query factors on conducting subgraph mappings. e analysis of initial data graph and subgraph mappings are illustrated in Figure 8.

DataSet.
A real-world dataset and a synthetic one are used in this paper.
(1) e NY Taxi Dataset4 is a publicly available realworld dataset with total of 1 billion taxi related RDF stream data. e dataset contains 17 different measurement values for taxi fares, locations, triple distance, triple time, etc. A query graph can be corresponded to at most 24 triple patterns.

Query Graphs.
e query graphs of different shapes (star, chain, and cycle) and scales (1-22 triple patterns) are designed to evaluate the matching influence of query on data graph.
(1) Query-Star. A star query refers to the graph containing an instance node with multiple attributes. us, the scale of star query depends on the number of attributes and the performance evaluation of query-star is presented in Figures 9(a) and 9(d).
(2) Query-Chain. A chain query refers to the graph containing multiple instance nodes linked in a line. us, the scale of chain query depends on the number of instance nodes and the performance evaluation of query-chain is presented in Figures 9(b) and 9(e).
(3) Query-Cycle. A cycle query refers to the graph containing multiple instance nodes linked in the form of cycles. e smallest cycle contains at least three instance nodes and three undirected edges. e processing of continuously embedding a query vertex and two edges into the cycle query is used to increasing the scale of query graph. And the performance evaluation of star-chain-cycle queries is presented in Figures 9(c) and 9(f ).

Analysis of Algorithms.
In this section, we mainly look at the total execution and traversal time-efficiency of iPM.
In the experimental evaluation, we focus on the countbased sliding window since it can be adapted to a time-based one using a simple transformation. e initial sliding window contains RDF data of size 10,000 which slides one data at a time.
e performance evaluation of iPM is executed on the dataset containing 0.5 million RDF data (About 10,000 data graphs). e compared algorithm of SPECTRA [27] is chosen to evaluating the experimental performance with our algorithm iPM because SPECTRA is a competitor of our methods, which employ a set of vertically partitioned views to collect the summarized data from each event, and sibling lists are employed to incrementally index the joined triples between views. e matched results are shown in a set of intermediate view for ease of enabling an incremental evaluation with the arrival of new events.
SPECTRA [27] is a competitor for comparison experiments with our methods. A prominent contribution for PM-S is the study entitled SPECTRA, in which a set of vertically partitioned views is used to collect the summarized data from each event and sibling lists are employed to incrementally index the joined triples between views. e matched results are shown in a set of final views, thus enabling an incremental evaluation with the arrival of new events. e ten thousand data graphs are extracted from SNB dataset, which is described in Figure 8(a). e scales of most data graphs are located in the range of 15 to 20 RDF triples. e performance evaluation of subgraph results is presented in Figure 8(b). e quantity of subgraph result is evaluated within a sliding window designed as the sliding interval of 500 RDF triples. In the trend of experimental graphs, the results are incremental increasing before x � 1 because RDF data is constantly filled into the fixed window in the initial execution processing. en, the experimental graph is presented by a wavy line because the subgraph results are incrementally produced with graph stream updates. e total matching time of star, chain, and cycle queries are measured through different quantities of triple patterns, which are presented in Figures 9(a)-9(c), respectively. In the trend of experimental graphs, our methods (iPM) have a  16 Complexity more significant advantage than SPECTRA (SPE) in star and cycle queries. As the quantity of triple patterns increases, iPM approximates a linear growth trend, while SPE is closer to the exponential growth trend. e traversal time of star, chain, and cycle queries are measured through different quantities of triple patterns, which are presented in Figures 9(d) -9(f), respectively. In the trend of experimental graphs, the total matching time of iPM increases first and then decreases as the scale of query graph enlarges, while SPE is closer to the linear or exponential growth trend. e variant traversal time indicates that massive RDF triples are filtered through candidate verification. us, the structure and label of query graph are beneficial to reduce the noncandidates in the flow graph index. e performance evaluations on NY taxi dataset are described in Figures 10(a) and 10(b). Figure 10(a) depicts the trend of different sliding size on intermediate results. e influenced trend can find a most suitable sliding size for continuous subgraph matching. In NY taxi dataset, the most suitable sliding size is 200. Figure 10(b) depicts the matching time with different query scales coupled with a most suitable sliding size. In the trend of experimental graphs, the matching time is increasing first and then decreasing as the quantity of triple patterns enlarges. Intuitively, the massive RDF data is filtered after x � 18.
e experimental results show that our methods are able to address the complex graph (i.e., star and cycle queries) and large datasets. Meanwhile, our method also provides better benefits with chain query.

Conclusions
In this paper, a flow graph index is first proposed to pruning the noncandidates of query vertices. e flow graph FG is a directed multigraph, which is constructed from the initial data graph G 0 and guided by a matching order of query graph. en, a sequential subdivision technology of the flow graph is employed to limit the search derivation of incremental subgraph matching. e sequential numbers of root candidates are assigned to the vertices of divided flow graphs and limit the search space of originating changed candidate of FG. For incrementally conducting the subgraph mappings, a state transition model is first used to illustrate the transition state of changed candidates, which consists of three states and six transition rules. Based on the state transition model, we analyze the influence of changed candidates to adjacent region and design our incremental maintenance strategy. en, an incremental subgraph matching algorithm is executed on the sequential divided flow graph. e consistency of subgraph matching is guaranteed by two verifications of selected candidates, relational and sequential verifications. Finally, extensive empirical studies on real and synthetic graphs demonstrate that our techniques outperform the state-of-the-art algorithms.

Data Availability
e NY Taxi data used to support the findings of this study have been deposited in the repository http:/ chriswhong.com/open-data. Previously reported Social Network Bechmark (SNB) data were used to support this study and are available at DOI: 10.1145/2723372.2742786. ese prior studies are cited at relevant places within the text as references [22,27].

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this article.