OMFM: A Framework of Object Merging Based on Fuzzy Multisets

Information fusion is a process of merging information from multiple sources into a new set of information. Existing work on information fusion is applicable in various scenarios such as multiagent system, group decision making, and multidocument summarization. This paper intends to develop an effective framework to solve object merging problem based on fuzzy multisets. The objects defined in this paper are data segments in document fusion task, referring to the concepts with semantic-related terms of different semantic relations embedded. The fundamental operation is the merge function mapping data segments in multiple fuzzy multisets onto one object, which is a solution. Under this framework, we define quality measures of purity and entropy to quantify the quality of the solutions, balancing accurateness, and completeness of the results. Merge function that yields this kind of solutions is VI-optimal merge function and a series of theoretical properties concerning it are studied. Finally, we investigate the proposed framework in a special application scenario (i.e., document fusion) which is related to the task of multidocument summarization and show how the framework works with illustrative example.


Introduction
As an important research area, information fusion is a process of merging information from multiple sources into a new set of information.There are many applications in this research area such as heterogeneous database, multiagent system, group decision making, and multidocument summarization.Under different application scenarios, different principles and procedures are utilized to solve the problems.Many classical mathematical theories of aggregation operators [1][2][3][4][5] have been developed for multiagent system and group decision making system, and the information that aggregation operators try to fuse typically expresses facts of opinion or score of an agent.Besides these researches, a fair amount of work focused on the situation where the source is regarded as a propositional belief [6][7][8].The existence of nonfactual knowledge like integrity constraints and inference rules makes the difference between these two theories.As a result, a lot of work has been done in the heterogeneous database area on first-order theory.Another type of fusion is that each source presents knowledge by means of a possibility distribution [9], in this case, the imperfection of incorrectness, uncertainty, and incompleteness in the data should be coped with.The main challenge is how to deal with conflicting information provided by different sources.
To address the issues in the third type of fusion, a framework of object merging has been investigated by using multiset theory currently, which could be utilized to solve the problem of multidocument summarization (MDS) [10].Also, object merging is a hot spot of research in many domains with good prospect for application.The framework of multiset merging for MDS has defined the merge function which maps the objects in multisets onto a single object and has got some foregoing results which cannot be considered as a final summarization yet [11,12], owing to the fact that these foregoing results are just some keywords without any relation among them, not mention to context of co-text, context of culture, context of situation, and so forth.The essential reason for this result is that the framework defined the quality measures with the multiplicity of element as the measure of important element.In other words, the multiplicity is equal to term frequency which is just shallow text feature.When performing source selection in MDS, the traditional method transformed one document into the representation of a vector of words or a multiset of words, which are just simple settings.Other progressive approaches should be proposed, which are semantically richer than using words as source representation.In short, the problem of processing coreferent objects has not been deeply investigated at present.On one hand, merging of nonquantitative objects, especially the objects with semantic information, has not been proposed.On the other hand, object merging functions and the rationality of merging still need to be further investigated.
Within the scope of our paper, we also focus on the problem of object merging in information fusion, and our work should be treated as an extension of the framework mentioned hereinbefore.There are many differences between these two works.The basic difference concerning the definition of coreferent objects: coreferent objects in paper [11] are the objects describing the same entity in the real world, while in our paper the object we're discussing is a piece of data or information, which could be used to denote the same concept with semantic-related terms of different semantic relations embedded.Then, fuzzy multiset theory is investigated in our paper, in which membership degree function and length function are used to describe both uncertainty and repeatability of the natural language.When performing fusion in practical situations, the object merging process has considered deep text features of semantic relations such as hypernym, synonym, and antonym.Moreover, two quality measures (purity [13] and entropy [14,15]) widely used in the text mining literature are adopted to quantify the result of a merge function.Thus, the behavior of the merge functions we defined in this paper can be characterized by the behavior of the quality measures.With this strategy, we can get an optimal merge result.The possible application of this work is document fusion [16], where a collection of textual documents is used to produce the shortest description containing all information found within the document set, but without repetition.Existing solutions for this problem normally focused on statistical methods or heuristics methods used in multidocument summarization [17,18].In this paper, object merging based on fuzzy multisets (OMFM) is definitely a meaningful attempt, where a source set of multiple documents is denoted as a multiset and each document is denoted as a fuzzy multiset of multiple concepts.
This paper is organized as follows.In Section 2, we review mathematical preliminaries.Furthermore, the general framework of objects and object merging are proposed in Section 3, and definition of the quality measures and construction of merge functions are introduced in Section 4. Next, demonstration about how our framework works on practical problem (i.e., document fusion) with illustrative example is presented in Section 5. Finally, in Section 6, we give the conclusion and future work to the proposed framework OMFM.

Preliminaries
In mathematics, fuzzy set introduced by Zadeh in 1965 is set whose elements have degrees of membership which is an extension of the classical notion of set [19,20].Fuzzy set theory is very useful to deal with problems that are not easily handled by classical computing techniques.On the other hand, the use of membership degrees instead of real numbers to represent memberships also provides a mean to measure the possible uncertainty in languages computational theory.The notion of multiset is a generalization of the classical notion of set in which members are allowed to appear more than once.As a data structure, multiset stands in between strings where a linear ordering of symbols is presented and sets where no ordering is considered.Combined with the notion of fuzzy set, multiset is generalized to fuzzy multiset [21], which could describe both uncertainty and repeatability of the natural language.Consider one language modeling problem: given some sentences, identify the concepts and words which are similar or identical, and merge these objects to get a condensed description.This task is a challenging natural language problem with large amounts of diverse and compositional data.To solve this problem, we extend fuzzy multiset to produce a language model which maps data segments in multiple fuzzy multisets onto one object, where different semantic relations for one concept are treated as repeated elements with different membership degree in fuzzy multisets.In this section, mathematical theories of fuzzy set, multiset, and fuzzy multiset will be briefly reviewed.
Definition 1 (membership function).The membership function   :  → [0, 1] indicates the degree of  belonging to .   () = 1 indicates that element  completely belongs to set ; that is,  ∈  is the concept of traditional set.Definition 2 (fuzzy set).The membership function over  = { 1 ,  2 , . . .,   } defines a fuzzy set, which is represented as .Fuzzy set  with elements  1 ,  2 , . . .,   can be denoted as According to the definition of fuzzy set, to what extent an object belongs to a set is not fixed any more, and the membership of each object falls in the range of interval [0, 1].
A multiset  also could be denoted as There are some basic operators and relations of multiset below: Inclusion: (1) Equality: Intersection: Union: Addition: Definition 4 (-cut set of multiset).The -cut set of a multiset  is denoted as   and given by   = { |  ∈  ∧ Count  () ≥ }.
Note that the difference between the notation  () and   is that the former one means assigning an index  to the multiset  and the latter one means the -cut set of the multiset  [23].
The set of all fuzzy multisets drawn from a universe  is denoted as M().
Definition 8 (-cut set of fuzzy multiset).The -cut set of a fuzzy multiset M is denoted as M and given by Note that the difference between the notation M() and notation M is that M is preserved for the -cut set of the fuzzy multiset M, while M() means assigning an index  to the fuzzy multiset M.

Objects and Object Merging
3.1.The General Framework.We have reviewed the most relevant definitions in the previous section.As we've mentioned earlier, the framework in our paper extends the work in paper [11], so now we will introduce some work basis below.The bases involve the redefinitions of coreferent objects and merge function in OMFM, and a brief review of properties of preservation and majority rule in [11].
The bases involve the redefinitions of coreferent objects and merge function in OMFM.
Reference function  :  →  is formalized to describe a concept in the real world, where  symbolizes the real world.By definition, two concepts are called coreferent if they describe the same real world concept.Definition 9 (coreferent objects).Let  be a universe set of concepts.Two concepts  1 and  2 are coreferent if and only if ( 1 ) = ( 2 ).By the definition above, two objects that describe the same real world concept with semantic-related terms of different semantic relations embedded are formalized axiomatically.Here, we consider the context as the baseline: when describing a theme in a document, some semantic-related terms relating to this concept will be used to extend the theme.

Definition 10 (merge function). The merge function in OMFM is represented by function 𝜛 : M(𝑈) → 𝑈.
Mapping the fuzzy multisets of objects onto a single object is the job of merge function in our work, and these functions are often idempotent; that is, (ũ, ũ, . . ., ũ) = ũ, ∀ũ ∈ .This conclusion is also suitable in this paper and corresponding proof will be given in the following section.
A brief review of two important properties.
Property 1 (preservation).A merge function is preservative when merge function only selects one of the elements from the source set, the property of preservation in OMFM is denoted as Property 2 (majority rule).If the multiplicity value of an element is larger than the half of cardinality value of the source set then this element must be selected by the merge function, which is denoted as The majority rule above is an important property for merge function in multiset that was further studied in [25] and a weaker version has been proved in [26].By now, the majority rule is not extended deeply in fuzzy multiset as it does not apply in general, but the preservation rule will be elaborated in our paper.

Merging of Fuzzy Multisets.
Within the scope of OMFM, we focus on the case of object merging of compound a multiset and multiple fuzzy multisets with the function of the type below: where the elements of M() are denoted as M() , and the elements of M ( M ()) are  () .Here, the multiset  () could be denoted as The fundamental operator is mapping the data segments of fuzzy multisets onto one object, which is called a solution.In following sections the symbol  is used to represent a random solution of a given merge function; that is, () = .
The case is ( M(), ⊆) is not an upper bounded lattice.The normalization criterion that is needed when performing merge functions is usually omitted by fuzzy multiset theory.Therefore, we show another property below.

Property 3 (boundedness). A bounded merge function 𝜛
over M() should satisfy the following constraint: It indicates that the merge function selects one of the elements from all the source sets.A corresponding inference is that This inference explains that any element not belonging to any source set should not exist in the outcomes of a bounded merge function.We could easily get this natural property just from the observation, because element  with membership degree () should not be mixed into a solution arbitrarily.Also, it is a weaker notion of preservation.Besides, we also formulate the enforcing preservation: Then, Property 3 is equivalent to indicating Paper [11] has pointed out that keeping the weaker version of Property 2 in the situation of multiset is advantageous.They take multidocument summarization (MDS) as an example to explain that keeping a strict preservation would lead to a bad result in practical situations, that is, one of the documents itself would be the summary of the entire document set.While the task of document fusion (DF) is to generate a text containing all the information in entire document set.So, a weaker version of Property 2 is also advantageous in our framework.The bounded merge function of fuzzy multiset will be further elaborated in subsequent sections.
Proof.We can get the proof from the case that for any

Optimal Merging of Fuzzy Multisets
4.1.Quality Measures.The purpose of defining quality measures is to construct the merge functions that could get good performance for object merging in multiple fuzzy multisets.On one hand, the behavior of the merge functions we defined could be characterized by the value of the quality measures.On the other hand, adjusting a merge function could also optimize a balance between accurateness and completeness of a given solution to get a higher value of quality measures.The relationship between the merge functions and quality measures can be shown in Figure 1.
Within the scope of our paper, we adopted two quality measures widely used in the text mining literature: the first one is purity [13], and the second one is entropy [14,15].Information entropy is a concept used to measure the amount of information in the information theory, which is often taken as a measure of "disorder;" that is, the higher the value of entropy, the higher the extent of disorder; information purity is a measure of correlation between a system and its environment, where a higher value of purity means that a system is more relevant to its environment.Both of the two measures fall into range interval [0, 1].Basically, the maximum purity and minimum entropy of results are the goals we try to achieve.Nevertheless, when we try to analyze the effect of a merge function, we should be able to analyze the effect at fundamental level of the elements.So, some local quality measures will be introduced first.
Definition 13 (local precision).Given a multiset  () = { M( 1 ) , M( 2 ) , . . ., M(  ) }, the local precision of the element  could be defined as (23) such that Count  () ( M() ) . ( The local precision judges the accurateness of adding the element  with the membership degree  into the solution. Here,  () is a multiset of sources. * judges the proportion of fuzzy multisets where the membership degree of element  is .
Property 4 (monotonity of  * ).Local precision  * is a decreasing function in accordance with the membership degree threshold : The monotonity of  * is a natural property.The lower membership degree means more sources will be added into the solution, owing to the fact that higher membership degree indicates relative simple relations related one concept (say the synonym of one word), and lower membership degree indicates more unspecific and more layered descriptions concerning one concept.As a result, we will get more complete information with higher precision.
Definition 15 (purity).Purity is computed using the maximal local precision value for each element in the solution  as follows: such that Property 5 (monotonity of  * ).Local precision  * is an increasing function in accordance with the degree of membership threshold  when 0.5 ≤  * ≤ 1, a decreasing function in accordance with the degree of membership threshold  when 0 ≤  * ≤ 0.5: Property 5 implies that the variation trend of local entropy is impacted by both fuzziness and proportion of an element  in a solution; that is, neither excessively detailed or excessively brief information, nor more sources or less sources contained in the solution is appropriate to enrich the information of a fusion system.The proofs of these natural properties are omitted here.Back to our approach, the important connection exists between local precision and local entropy is also reflected by this property.
Definition 18 (total entropy).The total entropy of  () is calculated as such that The purity and entropy can, respectively, express the quality measures, but the variation scales between them may be unequal.As mentioned above, the maximum purity and minimum entropy of results are the goals we try to achieve.Therefore, we try to investigate an index with the similar variation scales.(39) Next, the rationality of this index will be shown.Generally, a brilliant result is generated by the higher value of the purity and the lower value of the entropy.That is to say, if the discrepancy between these two values is large, the value of the validation index is large and a good result can be determined by this validation index.That is to say, a balance between purity and entropy is expressed by validation index.In the case where the variation scales of these two values are similar, we propose a constant value  which could change the similar variation scales of purity value and entropy value.In practice, we determine the most significant  singular values by selecting the best VI, and it is kind of an empirical value which could be achieved during the simulation and modified through iterated procedure.But how to determine the value of this constant is not the problem we really care about now, we have not discussed this problem deeply in this paper.In our future work, we will explore this problem deeply with experimental analysis.
Note that for any solution , VI( | ) ̸ = 0 if and only if the local precisions of all elements in this solution differ from zero.

Optimization of Quality.
The effect of a merge function can be judged by quality measures introduced in previous phase.And then we try to investigate the solutions optimizing the values of the quality measures.This type of optimization problem also appears in other research fields, paper [27] utilized the transitive closure as the effective mechanism transforming a matrix into fuzzy equivalence relation, by this way, finding the approximate partitions of data sequences.It is a classic example in the field of fuzzy set theory.Another example involved searching approximate minimum-distance by transforming a fuzzy reciprocal relation with a transitive reciprocal relation [28].That is to say, the optimization mechanism could not be one of a kind.At the next step, we will concentrate on maximum quality generated from VI-value (the maximization of the purity and the minimization of the entropy).The difficulty of this step is to find the solution  which gets the best VI-value.Therefore, the main task here is to define and investigate a suitable merge function.(41) At this step, some properties of VI-optimal merge function will be studied further.A notable point is that there may appear several solutions sharing one maximum VI-value.With the definition of the merge function, how to select the unique solution is an important task here.Therefore, a selection criterion that selects one solution from the optimal solutions set is needed when performing these merge functions.With the special application area of OMFM, we will Mathematical Problems in Engineering show the details in illustrative examples.Another problem is a solution that has VI( | ) ̸ = 0 does not always exist.Hence, the notion of invalid solution is given below.

Definition 22 (invalid solution). Assume a VI-optimal merge function 𝜛 and a fuzzy multiset of sources 𝑀 ∈ M( M(𝑈)).
A multiset  ∈ M() is defined as an invalid solution of () if A solution of a VI-optimal merge function  that is not invalid is called avalid solution.Notice the differences between invalid solution and valid solution.Then, we will introduce another significant theorem.

Theorem 23. Any solution that is a real subset of the source intersection or a real superset of the source union has that
Proof.Assume a fuzzy multiset of source  () = { M( 1 ) , M( 2 ) , . . ., M(  ) }.
(1) A solution that satisfies Also it satisfies Owing to the case of all the elements of the solution  would generate a local precision equivalent to 0, then (2) A solution that satisfies also satisfies Owing to the case of all the elements of the solution  would generate a local precision equivalent to 0, then The conclusion here is that a valid solution of VI-optimal merge function should include the intersection of the sources and should be included by the union of the source.In view of this point, we define the intersection of the sources as the lower bound and the union of the sources as the upper bound.The formalized definition is shown as where the lower bound is denoted as  and the upper bound is denoted as .Hence, we shall only consider solution  that satisfies  ⊆  ⊆  in the following section.
Theorem 24.An VI-optimal merge function is idempotent.
Thus, for  =  = , we have that The corresponding proof is also shown when applying the previous theorem.

Theorem 25. A VI-optimalmerge function 𝜛 is bounded.
Proof.With Theorems 12 and 23, we could get this conclusion.
An important point is that VI-optimal merge functions do not satisfy the property of preservation.Nevertheless, due to the theorem we just proved above, they are bounded undoubtedly and boundedness offering a weaker version of preservation is shown in previous section.Besides the theorem of boundedness, several interesting theorems relevant to VI-optimal merge function need to be mentioned here.One of them is the theorem of VI-optimality invariance when scaling of multiplicity of the sources below.
Theorem 26.Assume a fuzzy multiset  () = { M( 1 ) , M( 2 ) , . . ., M(  ) } and a merge function .A conclusion could be got that Proof.Several facts could be got that Proof.We could get the corollary in last theorem.

An Application: Document Fusion with Illustrative Example
5.1.Document Fusion.One possible application for this fuzzy multiset framework is document fusion.It involves the merging of elements with the different relations embedded.When it comes to document fusion, we have to introduce multidocument summarization briefly.Document fusion and multidocument summarization are two relevant areas.The important difference between these two areas is that, for multidocument summarization, the main task is to generate the shortest description containing the most relevant information, while for document fusion, the focus is to generate the shortest description containing all information contained in the whole document set excluding the redundancy [15,16].It is like that multidocument summarization is the intersection of the documents and document fusion is the union of the documents.Unlike multidocument summarization system, there is no organization like DUC (Document Understand Conference) [29] providing "ideal" datasets for document fusion research yet, with which multiple documents under same subject and ideal summarization results for testing can be achieved.In addition, intrinsic and extrinsic evaluations in multidocument summarization system could not be suitable in fusion task: intrinsic evaluation where evaluation is done by human on accessing the quality of the fused documents itself makes the evaluation process subjective [30], and on the other hand, the difficulty in intrinsic evaluation of document fusion systems is that there is no existing collection of human written fusion results of multiple documents, serving as a gold standard for such evaluations by now; and extrinsic evaluation where the result of the document fused is evaluated by the completion of a specific task makes the evaluation process more complicated.Thus, there are no standard methods used to estimate the work in fusion task like in some document summarization tasks [31][32][33].Given the problems we mentioned above, the evaluation that we performed is limited to date.To demonstrate our work, an example of an article cluster concerning the spoilage problem complaints of the dairy products on a particular brand has been selected from "315 consumption complaint" website to show the general fusion process and results by utilizing our framework.Although we use Chinese text for illustration, it is worth mentioning that there is not any fundamental difference between Chinese and English or other language under this framework.
The work of our paper is to propose a framework for document fusion, so we are not only aiming to get keywords, but for comprehensive information.Here, we just try to consider the situation of fuzzy multiset.With such extensions, the membership degree could be used to show the importance and fuzziness of an element, which makes the document representation more granular and semantically richer than multiset merging model in paper [11].Assigning different weights to the same element also makes sense, when considering the situation that semantic-related terms with different semantic relations are used to identify the concept, which is semantically richer than just using words.Under our framework, semantic methods and statistical methods could be combined and used in many domains.

Illustrative Example.
The main processing that needs to be performed is to get the Extra Strong, Strong, and Medium Strong relations of every concept in each article by using HowNet [34].As a common-sense knowledge base, HowNet unveils interconceptual and interattribute relations of concepts.In HowNet, every concept of a word or phrase and its description form one entry with relations such as hypernym, hyponym, synonym, antonym, meronym, and Holonym (descriptions for these relations could be seen in Table 1), existing in HowNet and presented in DEF (concept definition) as shown in Box 1.
When performing English text, a large lexical database of English, WordNet, could be used to identify these relations instead of HowNet.Here, textual intention structure is determined by three relations of every concept.As an indicator in

Relations Descriptions
Hypernym Also known as a superordinate, which is a word referring to broad categories or general concepts.For example, "musical instrument" is a hypernym of "guitar" because a guitar is a musical instrument.

Hyponym
A word or phrase whose semantic field is included within that of another word, a hyponym shares a type-of relationship with its hypernym, for example, "pigeon, " "crow, " "eagle, " and "seagull" are all hyponyms of "bird." Synonym A word with the same or similar meaning of another word.

Antonym
A word which has the opposite meaning as another word.

Meronym
Meronym denotes a constituent part of or a member of something.For example, "finger" is a meronym of "hand" because a finger is part of a hand.Similarly, "wheels" is a meronym of "automobile."

Holonym
Holonym defines the relationship between a term denoting the whole and a term denoting a part of or a member of the whole.For example, "tree" is a holonym of "bark, " of "trunk" and of "limb." linguistic segments, three relation segments of every concept tend to indicate the theme segments.That is to say, once three relations of every concept have been confirmed, the corresponding linguistic segments will have determinate tendency.Each concept is defined as an element in fuzzy multisets and the three relation segments that are included in each concept determine the different membership degree of each element as shown in Table 2.
As we've mentioned above, document fusion is to produce the shortest description containing all information found within the document set, but without repetition.The solution that we need is the solution concluding all the key concepts (, , and  in this example) and these concepts are constructed by three relations (Extra Strong, Strong, and Medium Strong relations).Let us consider the solution  = {(, 1), (, 0.5), (, 0.2)}.We get the local precision of all elements in this solution On the same principle, we present the local precision, purity, local entropy, entropy, and VI-value of the solutions with only one semantic relation embedded in each concept.If every concept is treated of equal importance with single relation embedded, in this case, we have got a maximal VIvalue 0.355 (see Table 3).
As mentioned in former section, a valid solution  of a VIoptimal merge function should follow the constraint below: We should only consider solution  that satisfies the constraint mentioned above in practical application.So, more complicated solutions could also be considered.For example, when concept  is an important concept which needs to be described explicitly in the fusion result, more details as Medium Strong relation in Table 2 should be contained to construct the description.If the concept is not treated of equal importance with single relation embedded, in this case, we've got two maximal VI-value of 0.491 (see Table 4).Generally, for any solution , we can calculate the VI-values and choose the solution based on the observation of maximal VI-value.Within the scope of this paper, a tie-breaking criterion does not always exist, the accessorial choice criterion that helps selecting a solution from the set of optimal solutions is necessary.From the observations of  10 and  19 , we need to decide the merging function by actual requirement or finding a new solution with more semantic relations embedded.
More examples with considering concept  and concept  as important concepts which needs to be describe explicitly in the fusion result, will be shown in Tables 5 and 6.When concept  and concept  are with similar scale of multiple relations embedded, we could get equivalent maximum VIvalue 0.410 from  28 and  38 , which means these two strategies get the same effect.If only one concept is with multiple relations embedded, we should consider  10 to  45 and get the maximum VI-value 0.491 from  10 or  19 .Both the situations needs to be further selected by considering the specific application: For  28 and  38 , two solutions selected strong and medium strong relations for concept  and , so the importance of concept should be further considered, that is to say, if concept  is more importance for fusion,  38 should be selected.For  10 or  19 , when the fusion results need to be described with more details, corresponding solution  10 should be selected; otherwise,  19 would also be a candidate selection for fusion.
Figure 2 shows the distribution of the VI-values with different scales of semantic relations embedded using boxand-whisker plot and evaluates the effectiveness of the merge function.The observations on partial of VI-values are presented here.The top bar stands for the maximum observation value; the bottom bar represents the lowest observation value.The bottom of the box is the lower quartile with 25% of We have not performed all the merge functions with corresponding VI-values in our illustrative example yet.The data and explanation we give above is just to show the proposed framework vividly.Now, we've got the conclusion that we could get a corresponding VI-value for any solution.On the other hand, we could use the VI-value to select the best solution.As our framework is flexible enough to generate shortest description containing all information found within the document sets with different levels of details depending on practical requirement.With this framework, neither the relations between the keywords nor the context of co-text will be lost.To generate a moderately fluent semantic fusion result from a collection of documents, sentence planning and regeneration are then used to combine the segments together to form a coherent whole.In our paper, the framework solved basic issues on developing a document fusion system.(1) The documents are fused on more levels of granularity, as we assign different weights to different semantic relations embedding in the same element of concept.(2) Meanwhile, taking into account semantic relations in the fusion progress ensures the readability of the fused document.

Conclusion and Future Work
We have presented a framework OMFM to map the fuzzy multisets of objects into one object.Our framework for merging multiple fuzzy multisets of documents is an interesting work, where a document set is modeled as a multiset of documents and each document is modeled as a fuzzy multiset of concepts.Also, OMFM is an extension of the work in paper [11], which could describe both uncertainty and repeatability of the natural language by using the membership degree as the semantic fuzziness of the objects.The quality measures widely used in the text mining literature are defined to quantify the result of a merge function: purity (a measure of correctness) and entropy (a measure of completeness), where the maximum purity is got by upper solution, and the minimum entropy is got by lower solution.Then, we have constructed VI-optimal merge function to get the best solution, where both the higher purity and the lower entropy could be achieved simultaneously.Moreover, we have proved the properties related to constraints of merging problem.Finally, how to settle the problem in document fusion application using OMFM has shown the practicality and effectiveness of our work.With comparatively higher theoretical value and prospect of application, object merging problem will become a hot spot of research in many domains.Future work will further focus on experimental research and applying this framework to solving more relevant problems.

Figure 1 :
Figure 1: The relationship between the merger functions and quality measures.

Figure 2 :
Figure 2: The distribution of VI-value with different scales of semantic relations embedded.

Table 1 :
Descriptions for different semantic relations.

Table 3 :
VI-values of solutions (concept with single relation embedded).

Table 4 :
VI-values of solutions (concept  with multiple relations embedded).
solution and lower solution).The -axis presents the VIvalues.By observing the range of VI-values, we see that, for all VI-values, there is a wide range of values that are achieved by different merge functions.Thus, simple strategies are not likely to work well.Observing the performance of the multiple relations embedded strategy, we see that

Table 5 :
VI-values of solutions (concept  with multiple relations embedded).

Table 6 :
VI-values of solutions (concept  with multiple relations embedded).