Research Article A Graph-Based Method for IFC Data Merging

. Collaborative work in the construction industry has always been one of the problems solved by BIM (Building Information Modeling) technology. The integration of IFC (Industry Foundation Classes) data as a general building information standard is one of the indispensable functions in collaborative work. The most practical approach of merging IFC data depends on GUID (Global Universal Identiﬁer) comparison at present. However, GUID is not stable in current applications and often changes when exported. The intact representation of relationships between IFC entities is an essential prerequisite for proper association of IFC entities in IFC mergence. This paper proposes a graph-based method for IFC data merging. The IFC data are represented as a graphical data structure, which completely preserves the relationship between IFC entities. IFC mergence is accomplished by associating other data with an isomorphic graph that is obtained by mining the IFC graph. The feasibility of the method is proven by a program, and the method can ignore the impacts of GUID and other factors.


Introduction
Modern construction engineering often involves many specialties, and the construction process is complex [1,2]. Different participants need to exchange information during the construction period, and a large amount of data collaborative work is required. Information exchange in collaborative work depends on heterogeneous and domainspecific data formats [3], and different forms of drawings and documents are used in the conventional approach. e collaborative process requires manual data extraction and transformation, which causes poor efficiency.
With the development of BIM (Building Information Modeling), many researchers in the field of construction engineering utilize BIM to solve the problems in construction collaboration. It is a common method by which construction engineering participants acquire and store information through a BIM server or management system [4,5]. A BIM server manages building information through uniform data format, and different participants share information through an integrated database of information of different specialties. After IFC (Industry Foundation Classes) are published by buildingSMART in order to provide a collaborative environment for different participants [6,7], the BIM server usually uses IFC as data standard. IFC enable representation and exchange of a wide variety of project relevant information in a single data format, and MVD, as a subset of IFC, facilitates obtaining demand information for participants by defining templates for exchange information [8]. When a BIM server receives information from different specialties, it needs to integrate information to the complete data, and so integrating IFC data has become one of the indispensable functions of BIM servers [9]. A common way to integrate IFC data is to distinguish the same data by the attributes of GUID (Global Universal Identifier) in IFC entities and to merge the entities with different GUID attributes [10]. GUID is a globally unique ID that is specially used to identify a component or IFC entity. However, the current IFC files are not created directly but are exported by other BIM software, which results in different GUIDs of the same component being exported from different tools [11]. It is easy to make mistakes when integrating the IFC entity by GUID and entities in the IFC model which are related to each other and their relationships are used to determine the same data in IFC mergence, which are not considered in existing tools and collaborative servers. In addition, there are some tools that only merge the building elements on the view [12], and the data are not merged. erefore, a graph-based IFC data merging method is proposed in this paper. e IFC data are represented by a graph, and the graph is mined to obtain the largest common graph to merge the IFC graphs, which avoids the influence of GUID and is suitable for different merge scenarios. Section 2 reviews the related work and summarizes the existing problems. Section 3 introduces the method of graph-based mergence. Section 4 demonstrates and analyzes the experimental results. e final section draws the conclusions, summarizes the contributions, and discusses some future works.

Related Work
IFC mergence, as an indispensable function of IFC-based collaborative work, has been mentioned in many tools and research endeavors. However, there are some problems in mergence, which are analyzed from the two aspects of tools and studies. In the perspective of tool application, many IFC open-source parsing tools or commercial applications lose the semantics of IFC in IFC mergence, in which different IFC entities are combined by comparing GUID. BIMserver [13] is an open-source IFC server developed using JAVA, which supports the parsing of IFC and provides an integration function. However, since the integration is a textual mergence of two IFC files, two IfcProject entities will appear in IFC files, which violates the IFC rule that only one IfcProject entity is allowed in an IFC file. xBIM [14] is an open-source IFC parser developed using C#, which supports parsing and model display of the IFC2X3 and IFC4. An IFC mergence function is also provided, which uses IfcDocumentInformation to represent different IFC files, but the actual file has not changed. BIMVision is a free model browser [15], which provides plug-ins for merging IFC files. IFC data can be merged in three ways: GUID, elevation, and name. However, they are only saved as a private format of BVF, which affects the sharing of information. FZKViewer [16] developed by Karlsruhe Institute of Technology is an IFC browser, which can display the model after merging, but the relationship entities will be lost after export. Although ArchiCAD [17] supports different IFC files merging by GUID, the entities with the same GUID are still duplicated after merging. ere are also free or commercial modeling tools or IFC browsers, for example, DDS-CAD Viewer [18], Areddo [19], and Tekla BIMsight [20], which only merge IFC files in model view. erefore, most of the tools are not good enough from the perspective of IFC mergence. e methods of IFC mergence are proposed in some studies of IFC collaboration. e concept of "deferred reference" is proposed to solve the mergence problem of different IFC models [21]. A new entity IfcDeferredReference is created to replace the spatial structure entity or component entity to be merged in the submodel. It needs to compare the GUID in mergence and replace IfcDeferredReference with the corresponding entity, which can modify the submodel independently and preserve integrity. However, the spatial structures or construction entities in other submodels need to be modified to IfcDeferredReference in advance, and the structure of IFC has been changed, which makes the parser require further changes to understand the model. EDMmodelServer [22] is used to test collaboration issues of BIM servers in Singh's research [23], which merges IFC file through the functions of check-in and check-out. ese functions mark building components with their GUID and replace existing components according to GUID when the model is reimported. A special model server property is generated through check-out, which causes mergence errors if it is not preserved in modeling software. Only ArchiCAD can preserve it at present.
Berlo et al. [24] and Nour [25] have proposed a mergence method based on GUID. In Nour's research, the geometric data are returned to the user for determination through model view and the rest are compared by GUID. e IFC data are converted into tree data structures, which are compared by the positions and values in the tree. GUID will change in different software, and thus this approach can only solve the situation in which GUID does not change. Shi et al. [26] also mentioned the conversion of IFC data into a tree structure for comparison and deletion of the same entity, which can improve efficiency in calculating the differences of IFC files. However, the same entity to be removed may be associated with different information in the actual project information, and the deletion of this part of the entity will cause other information changes [9]. In addition, the tree structure only retains the referenced relationship between the data, which loses the semantics of the relationship between the data. ere are data with different relationships located in the same location in the tree, which will cause distinguishing errors.
Most of the above studies merge IFC data depending on GUID, which cannot support all mergence scenarios and easily causes model duplication when GUID changes. e extension of IFC entities requires additional modifications to the IFC parser, which indirectly increases the complexity of information sharing. e representation of an IFC model by tree structure destroys the relationship meaning of data. To solve these problems, IFC data require a reasonable method of representation which conforms to the structure of IFC and keeps IFC information intact during the merging process, but the relationship between IFC entities can also be correlated correctly without impact of GUID. e graph structure representation of the IFC model is proposed in this paper, and IFC graphs are mined for comparing the IFC model that is able to ignore the impact of GUID. At last, the mergence of IFC data is completed by rebuilding the relationship between entities.

Graph-Based IFC Merging Method
e merging method is divided into three parts in this section: (1) e IFC model is transformed into graph structure. (2) e maximum common graph of IFC graph is mined to obtain the same data in IFC model. (3) Different IFC graphs are combined to complete mergence of IFC data.

Graphic Structure of IFC Data.
An instance is an attribute of other instances according to the IFC model description, which is shown by citing the instance number in the IFC file. For example, in Figure 1, #42 and #118 are cited by #278, respectively, and #42 is also cited by #118. erefore, there are instances with characteristic of multilevel references and cross-references in an IFC file. A graph is a data structure consisting of a set of vertices and edges, which is conformed to the characteristics of an IFC entity. e graphical representation of the IFC model has been studied before. e RDF semantic graph by Pauwels et al. [27] is used to store and represent IFC data. e application of RDF is to infer domain-specific knowledge from semantic graphs to solve problems, and a large number of vertices and edges are added in graph to describe ontology, which increases the complexity of graphs. However, domain knowledge is a huge system so it is unnecessary to be applied in merging of IFC data. e directed graph is used to represent the relationship between IFC entities in the study of Arthaud and Lombardo [28]. e relationship between inverse attribute and attribute is represented by bidirectional edge, and the nodes that are cited by other nodes will be duplicated in the IFC graph.
e relationship entities are represented by nodes that are connected with other entities by two edges of opposite direction, which is proposed by Tauscher et al. [29]. It is obvious that the graph in the above methods needs more storage space. e inverse attribute does not need to be required in merging, which is not necessary to represent in the graph.
To simplify the structure of the graph and facilitate the mergence of IFC files, a directed node graph that is suitable for IFC mergence is proposed to represent IFC data. Nodes and edges both have labels and values in the graph. e types of nodes are classified into three categories by labels, namely, SingleValue, ListValue, and Entity. e value types of IFC entities are marked as SingleValue and List-Value, which consist of IfcSimpleValue, IfcMeasureValue, and IfcDerivedMeasureValue in IFC specification. IfcSim-pleValue is similar to a numerical data type in programming languages, such as Interger and Logical. IfcMeasureValue is a value in the basic unit described in ISO31-0:1992, such as length and area. IfcDerivedMeasureValue is derived from different basic units, such as voltage and density. e entities of value type in IFC files do not cite other entities, which are the terminal nodes in the graph. e difference between SingleValue and ListValue is whether values are single or aggregative in IFC. For example, the description of Ifc-CartesianPoint in the EXPRESS specification is shown in Figure 2. e value of attribute marked by red is a list of IfcLengthMeasure.
e nodes labeled Entity are representations of entities in an IFC file, which have attributes cited by other entities or values.
If the node is labeled Entity, the value of the node is the entity name. If the node is labeled SingleValue or ListValue, the value of the node is the actual value in the IFC file.
ere are two categories of edge labels: Relationship and Attribute. e relationship entity in IFC is labeled by Relationship.
e relationship entity defines different relationships between entities, which include spatial logic relationship, attribute connection, component relationship, and work resource assignment. All relationship entities are inherited from IfcRelationship. e EXPRESS description of IfcRelationship is shown in Figure 3. IfcRelationship inherits IfcRoot, whose attribute is about relationship creation information. ese attributes are useless in IFC mergence. Although some relationship entities involve special attributes of relationships, this does not affect the mergence of IFC data because the IFC files are regenerated by the entities in the original IFC file corresponding to these relationships in graphs. e beginning and end of the edge are connected with two entity nodes that are correlated entities described in the relationship entity. If one of the entities is a form of set, every entity in the set will be connected to another entity described in the relationship entity by the same labeled edges. e direction of the edge is not specified., which does not affect graph mining.
IFC entity nodes are connected to their attribute nodes by edges labeled Attribute. e direction of the edge is that in which the entity node points to its attribute node. Every IFC entity is similar to the class described by EXPRESS language, whose attributes could be another class. For example, the value type of attribute, namely, OwnerHistory marked by red in Figure 3, is IfcOwnerHistory. e value of the edge labeled Relationship is the name of the relationship entity, and the value of the edge labeled Attribute is the attribute name. e IFC graph representation method is illustrated by the IFC graph shown in Figure 4, which is derived from the IFC fragment shown in Figure 5. e entity type of IFC is described as orange nodes and the value type of IFC is described as blue nodes. e relationship entity is described as red edges and the relationship between an entity and its attribute is described as black edges, whose labels are not displayed.

Mining the Maximum Common Subgraph of the IFC Graph.
e goal of maximum common graph mining is to find the isomorphic IFC graph with the largest number of nodes from different IFC graphs. e nodes are compared in the process of traversal in the graph, but not every node needs to be compared. Different traversing methods are adopted due to the peculiarity of nodes in the graph.
It is known from the structure of the IFC graph that the entity nodes connect attribute nodes by edges labeled Attribute. Some entities are not referenced by other entities in an IFC file; that is, they are not attributes of other entities (except the relationship entity). ese entities are not indicated by edges labeled Attribute in an IFC graph, such as the IfcBuilding node in Figure 4.
A transformation process is performed on the IFC graph according to these features before mining, and the nodes labeled Entity which are not indicated by the edge labeled Attribute and its attribute nodes are regarded as an integrated part, which is a node set with this entity node as the root node, such as IfcBuilding and its attributes nodes in Figure 6. e integrated part is represented as an entity node, and Figure 4 is transformed into Figure 7. It is noted that the Advances in Civil Engineering nodes in this node set are not exclusive, because different entities may include the same attribute. e graph is traversed in two ways in the process of mining the IFC graph. One is traversal of the transformed IFC graph, in which the entity node and its attribute nodes are regarded as an integrated part, and the transformed IFC graph only has an edge labeled Relationship. Another is the traversal of the node set, which is composed of the entity node and its attribute nodes in the transformation process. e mining methods are illustrated by example in detail. e mining method of the node set is that the node is traversed depth-first by an edge labeled Attribute with the entity node as the starting point. It is determined whether each node is identical to nodes in other graphs, which means that the label and value of the node are both identical. If a node is not matched, the traversal stops. In contrast, the node sets are included in the maximum common subgraph if all nodes are matched. e mining method of the node set is illustrated in Figure 8.
A node set with the entity node as the root node (the node is tagged "R" in the figure) is shown in Figures 8(a) and 8(b), where the letters indicate the values of edges and nodes. It is assumed that the "R" nodes are identical in graphs, and the traversing steps are as follows: (1) e traversal starts with R, next to RE1a, and the same is found in Figure 8(b). (2) Next to aE3d, and the same is found in Figure 8(b).
(3) Next to dE5g, but dE5g is not found in Figure 8(b). (4) Exit the traversal, and the result is that the entity node as root node in Figure 8(a) is not identical to the compared node in Figure 8(b).
e mining method of the graph in which the nodes and attribute nodes are regarded as an integrated part is that entity nodes are traversed depth-first by an edge labeled Relationship, with the IfcProject node as starting point, because IfcProject, as the only entity in the IFC file, does not have ambiguity.
When the entity node is not matched, which means that a node set that is composed of entity node and attribute nodes does not find isomorphic subgraphs in other IFC   graphs, the current traversing path terminates and restarts with the bifurcation points in the previous path. e mining method is illustrated in Figure 8. e entity nodes and attribute nodes are represented by the single node in Figures 8(a) and 8(b), where the "R" node is the IfcProject node. It is assumed that the "R" nodes are identical in graphs. e graphs are regarded as undirected connected graphs because the edge labeled Relationship is undirected. e traversal steps are as follows: (1) Starting with RE1a, an isomorphic part is found in Figure 9(b). Recording common subgraph is Figure 9(a) (2) Next to aE3d, and isomorphic part is found in Figure 8(b). Recording common subgraph is Figure 9(b) (3) Next to dE5g, but dE5g is not found in Figure 8(b).
Return to the bifurcation point in the previous path to continue traversing (4) Next to RE1b, and isomorphic part is found in Figure 8(b). Recording common subgraph is Figure 9(c). (5) Next to bE3e, but bE3e is not found in Figure 8(b).
Return to the bifurcation point in the previous path to continue traversing (6) Next to RE2c, but RE2c is not found in Figure 8(b).
All paths have been traversed.   Advances in Civil Engineering e maximum common subgraph is shown in Figure 9(c). When traversing to the entity node, its attribute node set will be mined through the previous method. e isomorphic subgraph is recorded as a part of a common subgraph, which is composed of entity nodes and attribute nodes.
Each entity node that is not referenced by other entities is the representation of a specific concept in an actual project, such as a wall, column, and construction process. erefore, if any attribute node of these entities is not matched, it is meaningless to compare other nodes in the node set that is composed of entity node and attribute nodes. e entire IFC graph is composed of these nodes and attribute nodes that connect to the IfcProject node by different relationships.
us, the traversal of a graph is through the Relationship edges as a path. ere are also special entities in IFC files which are not connected to IfcProject via IfcRelationship nor referenced by other entities. For example, IfcStyledItem, which represents the style information of geometry in IFC, usually shares some attributes with other entities. e mergence of these nodes is explained in the following section.   Advances in Civil Engineering e whole process starts with an entity node that is not referenced by other entities. If the nodes in its attribute nodes set are not matched to nodes in another graph, it is considered that this entity node and its attribute nodes are excluded from the common graph. e current traversing branch is terminated, and the node in other branches continues to be traversed until the end. In contrast, if the nodes are matched, they are marked as the nodes in the common IFC graph. e maximum common IFC graph is composed of all marked nodes. e whole process is shown in Figure 10.
ere are two special situations in the practical application, pertaining to nodes and edges. e first situation is the impact of GUID. GUID is the property of entities inherited from IfcRoot, which is the unique identity of entities. However, GUID may change depending on whether it is exported from the same or different software. erefore, when it is ascertained that GUIDs in IFC files are unstable, all Attribute edges whose value is GlobalID are deleted in IFC graphs, which makes GUID unrelated to comparison. Conversely, when it is ascertained that GUIDs in the IFC file do not change, the Attribute edges whose value is GlobalID are retained and other Attribute edges that connect to the same node connected by the GlobalID edge are deleted, which makes GUID the only compared criterion. Another is the impact of IfcOwnerHistory. IfcOwnerHistory, which is a property of entities that inherit from IfcRoot, describes the history and identification of building components, such as the information of the creator and creation time. As long as the model is reopened in software, IfcOwnerHistory will be changed and saved.
erefore, this part of information cannot be used as the compared content during merging of IFC models. e preprocess procedure is to delete all edges connecting IfcOwnerHistory nodes in the IFC graph, and the IfcOwnerHistory entity is reconstructed after merging.

IFC Graph Merging.
e nodes and edges that are not in the maximum common graph are reconnected with the maximum common graph according to the connection mode of the original graph after mining the maximum common graph. Two IFC graphs that need to be merged are shown in Figures 11(a) and 12(b), respectively. e entity nodes that are not indicated by the edge labeled Attribute are notated with single letters, among which the gray nodes are marked to indicate that they are in the maximum common graph, and its attribute node set is represented by yellow nodes that connect to entity nodes through directed edges. e special entities mentioned in the previous section are also considered, which are represented as red nodes, and their relationships with the attribute node set are represented as directed edges in the graph. e Relationship edges  Advances in Civil Engineering 7 are undirected in the IFC graph. e merging method is illustrated in Figure 12.
(1) e nodes that are adjacent to the maximum common graph are copied to connect to the common graph. e condition for retrieving the adjacent nodes is to find unmarked entity nodes that connect to marked nodes through Relationship edges. is step is performed to establish connection with the maximum common graph, which is shown in Figure 12(a).
(2) e unmarked nodes that are connected to adjacent nodes are copied to the common graph, which includes a portion of the special entity nodes. It is shown in Figure 12(b). (3) e remaining special entity nodes are added to the maximum common subgraph, as shown in Figure 12(c).
Some nodes have been replicated in the second step; thus it is necessary to determine the isomorphism of remaining     Advances in Civil Engineering 9 special nodes during the third step. e determination is similar to the traversal of the attribute node set in the previous section. Starting from the special entity node, the nodes indicated by the special entity node are traversed depth-first: if different nodes or edges are found, the traversal is terminated and the special nodes are added. Otherwise, these nodes have already existed and do not need to be added.
After the mergence is completed, the IFC file is regenerated according to the merged graph of IFC, where the instances described in each line are regenerated by the nodes labeled Entity and the attribute node set. e relationship instances are regenerated by the edges labeled Relationship based on the corresponding instances in the original IFC file, where only two related attributes are modified. If the edge labeled Relationship exists in the maximum common graph, the IFC file referenced in the regeneration of this edge should be a basic file in collaborative work. is approach preserves other attributes of relationship entities in the file, and therefore the conversion of relationship entities into edges does not cause information loss in IFC mergence.

Experimental Results
An IFC mergence tool is developed using C# in order to verify the feasibility of the method. e experimental plan is divided into two parts: one is that IFC files are merged under the same modeling software conditions, and another involves different modeling software conditions. e difference between the two parts is that the models in the same modeling environment are derived from the same software, and the IFC models in different modeling environments are derived from different software. e feasibility of the merging method is verified by comparing the count of IFC entities in the IFC model, which are merged by GUID and graph. Revit and ArchiCAD, which are common modeling software packages for the general public, are used as IFC file export software. A simple two-story villa building is used as an experimental model, which is composed of building walls, structural columns, floors, windows, and doors. e IFC files are divided into six situations according to different scenarios, as shown in Figure 13.

Same
Modeling Software Environment. Verification in the same modeling software environment includes scenario 1, scenario 2, and scenario 3. e IFC models derived from Revit and ArchiCAD are merged, where IfcOwnerHistory is preprocessed. e counts of instances are determined after mergence, including spatial entities, walls, floors, windows, doors, column entities, and other entities such as relationships and attributes. ree scenarios are validated as shown in Tables 1-3, respectively. In the tables, "a," "b," "c," "d," "e," and "f" correspond to the IFC models in Figures 12(a)-12(f ), respectively, and "a-b," "cd," and "e-f" represent the model after the mergence of a model and b model, the model after the mergence of c model and d model, and the model after the mergence of e model and f model. "GUID" represents that the model is merged through comparing GUID. "Revit" and "Archi-CAD" represent IFC models exported from Revit and ArchiCAD.
It can be determined from Table 1 that the counts of spatial entities (IfcSite, IfcBuilding, and IfcBuildingStorey) are 1, 1, and 3 in the merged graph, which are not increased. By the use of GUID merging, the same space entity is merged as a different entity; however, the counts of the same spatial entities have increased in the mergence using GUID, corresponding to 2, 2, and 3. e counts of spatial entities are equal to the sum of two files in mergence by GUID because the same spatial entity is regarded as different entities. e same mistake also occurs in Table 2. e count of IfcBuil-dingStorey is changed to 5, although it is obvious that there are only 3 floors in the model. e count of walls in scenario 3 is 9, but the counts of IfcWall and IfcWallStandardCase in Table 3 increase to 15 in the mergence using GUID. e three-dimensional displays of three scenarios are shown in Figure 14. e displays of scenario 1 and scenario 2 are the same because the spatial entity has no geometric information and does not affect the display. ere is an overlap of the external wall in scenario 3, where the same external wall is identified as different components due to comparison of GUID. It can be concluded that the graph-based merging method is more accurate in these scenarios, whether it involves the mergence of floors or the mergence of components.

Different Modeling Software Environments.
e experiment is the same as that in the same modeling environment, where models from Revit and ArchiCAD are merged with each other. e counts of entities after merging are shown in Tables 4-6. "R" in the tables represents exporting from Revit, while "A" represents ArchiCAD, and a, b, c, d, e, and f represent the same content as above.

Conclusion and Discussion
IFC is a popular data format for building information models, which makes it easier for different participants to share information. Aiming at the problem of IFC mergence in information collaboration, this paper proposes a method for graph-based IFC data mergence. e graphs are established using the characteristics of IFC data, which are mined to obtain the maximum common subgraph. e mergence of the IFC graph is completed by this method of establishing the connection between IFC graph and common graph. Finally, an ordinary two-story residential building is used as the experimental model to verify the merging method, which proves the feasibility of the method.
e main contributions of this paper are as follows: (1) IFC data can be clearly and concisely represented by graphs, where IFC entities are represented by nodes and relationship entities are simplified as edges. is method retains the structure of IFC data and is more useful to manage IFC data in IFC data mergence compared with the previous graph representation.
(2) A mining method of IFC graphs is proposed. IFC graphs are traversed to obtain the maximum common subgraph using the characteristics of graphs, which do not need to traverse every IFC node. is method is more efficient when comparing IFC entities. (3) e IFC graphs are merged to restore IFC files. e merging method that ignores the GUID and other impact factors can be applied in more scenarios than previous research methods. On the other hand, the integrity of IFC data is preserved because of the whole process without deletion of IFC data.
Information collaboration has always been a primary concern of the construction industry. e mergence of IFC data as a general building information standard is important for collaborative work. is paper only proposes a way to solve parts of problems, and a more understandable semantic representation of IFC is required to make it easier to apply IFC data in the future.

Data Availability
e IFC files used to support the findings of this study are available at https://github.com/Lyc-r/IFCmerge.

Conflicts of Interest
e authors declare no conflicts of interest.