Information fusion is a process of merging information from multiple sources into a new set of information. Existing work on information fusion is applicable in various scenarios such as multiagent system, group decision making, and multidocument summarization. This paper intends to develop an effective framework to solve object merging problem based on fuzzy multisets. The objects defined in this paper are data segments in document fusion task, referring to the concepts with semantic-related terms of different semantic relations embedded. The fundamental operation is the merge function mapping data segments in multiple fuzzy multisets onto one object, which is a solution. Under this framework, we define quality measures of purity and entropy to quantify the quality of the solutions, balancing accurateness, and completeness of the results. Merge function that yields this kind of solutions is VI-optimal merge function and a series of theoretical properties concerning it are studied. Finally, we investigate the proposed framework in a special application scenario (i.e., document fusion) which is related to the task of multidocument summarization and show how the framework works with illustrative example.
As an important research area, information fusion is a process of merging information from multiple sources into a new set of information. There are many applications in this research area such as heterogeneous database, multiagent system, group decision making, and multidocument summarization. Under different application scenarios, different principles and procedures are utilized to solve the problems. Many classical mathematical theories of aggregation operators [
To address the issues in the third type of fusion, a framework of object merging has been investigated by using multiset theory currently, which could be utilized to solve the problem of multidocument summarization (
Within the scope of our paper, we also focus on the problem of object merging in information fusion, and our work should be treated as an extension of the framework mentioned hereinbefore. There are many differences between these two works. The basic difference concerning the definition of coreferent objects: coreferent objects in paper [
This paper is organized as follows. In Section
In mathematics, fuzzy set introduced by Zadeh in 1965 is set whose elements have degrees of membership which is an extension of the classical notion of set [
The membership function
The membership function over
According to the definition of fuzzy set, to what extent an object belongs to a set is not fixed any more, and the membership of each object falls in the range of interval
A multiset
The cardinality of a multiset
A multiset
There are some basic operators and relations of multiset below: Inclusion:
Equality:
Intersection:
Union:
Addition:
The
Note that the difference between the notation
Combined with the concept of fuzzy set, the traditional concept of multiset could be denoted as
A fuzzy multiset
The concept of multiplicity
The number of occurrence or cardinality of a fuzzy multiset
There are some basic operators and relations of fuzzy multiset below: Inclusion:
Equality:
Addition: For Intersection:
Union:
Note that when performing any operator for two fuzzy multisets, the length of the membership degree sequences
A fuzzy multiset
The set of all fuzzy multisets drawn from a universe
The
We have reviewed the most relevant definitions in the previous section. As we’ve mentioned earlier, the framework in our paper extends the work in paper [
The bases involve the redefinitions of coreferent objects and merge function in OMFM.
Reference function
Let
By the definition above, two objects that describe the same real world concept with semantic-related terms of different semantic relations embedded are formalized axiomatically. Here, we consider the context as the baseline: when describing a theme in a document, some semantic-related terms relating to this concept will be used to extend the theme.
The merge function in OMFM is represented by function
Mapping the fuzzy multisets of objects onto a single object is the job of merge function in our work, and these functions are often idempotent; that is,
A brief review of two important properties.
A merge function is preservative when merge function only selects one of the elements from the source set, the property of preservation in OMFM is denoted as
If the multiplicity value of an element is larger than the half of cardinality value of the source set then this element must be selected by the merge function, which is denoted as
The majority rule above is an important property for merge function in multiset that was further studied in [
Within the scope of OMFM, we focus on the case of object merging of compound a multiset and multiple fuzzy multisets with the function of the type below:
The fundamental operator is mapping the data segments of fuzzy multisets onto one object, which is called a solution. In following sections the symbol
Given a fuzzy multiset over
The case is
A bounded merge function
It indicates that the merge function selects one of the elements from all the source sets. A corresponding inference is that
This inference explains that any element not belonging to any source set should not exist in the outcomes of a bounded merge function. We could easily get this natural property just from the observation, because element
Paper [
The functions
We can get the proof from the case that for any
The purpose of defining quality measures is to construct the merge functions that could get good performance for object merging in multiple fuzzy multisets. On one hand, the behavior of the merge functions we defined could be characterized by the value of the quality measures. On the other hand, adjusting a merge function could also optimize a balance between accurateness and completeness of a given solution to get a higher value of quality measures. The relationship between the merge functions and quality measures can be shown in Figure
The relationship between the merger functions and quality measures.
Within the scope of our paper, we adopted two quality measures widely used in the text mining literature: the first one is purity [
Given a multiset
Given the multiset
Local precision
Purity is computed using the maximal local precision value for each element in the solution
Given the multiset
Then, we get the purity
The local entropy of each fuzzy multiset in
Local precision
Property
The total entropy of
Given the multiset
The purity and entropy can, respectively, express the quality measures, but the variation scales between them may be unequal. As mentioned above, the maximum purity and minimum entropy of results are the goals we try to achieve. Therefore, we try to investigate an index with the similar variation scales.
Given a multiset of sources
Next, the rationality of this index will be shown. Generally, a brilliant result is generated by the higher value of the purity and the lower value of the entropy. That is to say, if the discrepancy between these two values is large, the value of the validation index is large and a good result can be determined by this validation index. That is to say, a balance between purity and entropy is expressed by validation index. In the case where the variation scales of these two values are similar, we propose a constant value
Note that for any solution
The effect of a merge function can be judged by quality measures introduced in previous phase. And then we try to investigate the solutions optimizing the values of the quality measures. This type of optimization problem also appears in other research fields, paper [
A VI-value merge function
At this step, some properties of VI-optimal merge function will be studied further. A notable point is that there may appear several solutions sharing one maximum VI-value. With the definition of the merge function, how to select the unique solution is an important task here. Therefore, a selection criterion that selects one solution from the optimal solutions set is needed when performing these merge functions. With the special application area of OMFM, we will show the details in illustrative examples. Another problem is a solution
Assume a VI-optimal merge function
A multiset
Any solution that is a real subset of the source intersection or a real superset of the source union has that
Assume a fuzzy multiset of source A solution Also it satisfies
Owing to the case of all the elements of the solution A solution also satisfies
Owing to the case of all the elements of the solution
The conclusion here is that a valid solution of VI-optimal merge function should include the intersection of the sources and should be included by the union of the source. In view of this point, we define the intersection of the sources as the lower bound and the union of the sources as the upper bound. The formalized definition is shown as
An VI-optimal merge function is idempotent.
Assume the fuzzy multiset
Thus, for
A VI-optimalmerge function
With Theorems
An important point is that VI-optimal merge functions do not satisfy the property of preservation. Nevertheless, due to the theorem we just proved above, they are bounded undoubtedly and boundedness offering a weaker version of preservation is shown in previous section. Besides the theorem of boundedness, several interesting theorems relevant to VI-optimal merge function need to be mentioned here. One of them is the theorem of VI-optimality invariance when scaling of multiplicity of the sources below.
Assume a fuzzy multiset
Several facts could be got that
Assume a fuzzy multiset
We could get the corollary in last theorem.
One possible application for this fuzzy multiset framework is document fusion. It involves the merging of elements with the different relations embedded. When it comes to document fusion, we have to introduce multidocument summarization briefly. Document fusion and multidocument summarization are two relevant areas. The important difference between these two areas is that, for multidocument summarization, the main task is to generate the shortest description containing the most relevant information, while for document fusion, the focus is to generate the shortest description containing all information contained in the whole document set excluding the redundancy [
The work of our paper is to propose a framework for document fusion, so we are not only aiming to get keywords, but for comprehensive information. Here, we just try to consider the situation of fuzzy multiset. With such extensions, the membership degree could be used to show the importance and fuzziness of an element, which makes the document representation more granular and semantically richer than multiset merging model in paper [
The main processing that needs to be performed is to get the Extra Strong, Strong, and Medium Strong relations of every concept in each article by using HowNet [
Descriptions for different semantic relations.
Relations | Descriptions |
---|---|
Hypernym | Also known as a superordinate, which is a word referring to broad categories or general concepts. For example, “musical instrument” is a hypernym of “guitar” because a guitar is a musical instrument. |
Hyponym | A word or phrase whose semantic field is included within that of another word, a hyponym shares a type-of relationship with its hypernym, for example, “pigeon,” “crow,” “eagle,” and “seagull” are all hyponyms of “bird.” |
Synonym | A word with the same or similar meaning of another word. |
Antonym | A word which has the opposite meaning as another word. |
Meronym | Meronym denotes a constituent part of or a member of something. For example, “finger” is a meronym of “hand” because a finger is part of a hand. Similarly, “wheels” is a meronym of “automobile.” |
Holonym | Holonym defines the relationship between a term denoting the whole and a term denoting a part of or a member of the whole. For example, “tree” is a holonym of “bark,” of “trunk” and of “limb.” |
taste: DEF = attribute, taste, & edible shape: bearing:
When performing English text, a large lexical database of English, WordNet, could be used to identify these relations instead of HowNet. Here, textual intention structure is determined by three relations of every concept. As an indicator in linguistic segments, three relation segments of every concept tend to indicate the theme segments. That is to say, once three relations of every concept have been confirmed, the corresponding linguistic segments will have determinate tendency. Each concept is defined as an element in fuzzy multisets and the three relation segments that are included in each concept determine the different membership degree of each element as shown in Table
Three relations for constructing concepts.
Composition of three relations | Membership degree | |
---|---|---|
Extra Strong | Identity | 1 |
Strong | ① Synonym ② Hypernym, Hyponym, Antonym | 0.5 |
Medium Strong | Meronym, Holonym | 0.2 |
In the following example, three semantic segments concerning different concepts
As we’ve mentioned above, document fusion is to produce the shortest description containing all information found within the document set, but without repetition. The solution that we need is the solution concluding all the key concepts (
On the same principle, we present the local precision, purity, local entropy, entropy, and VI-value of the solutions with only one semantic relation embedded in each concept. If every concept is treated of equal importance with single relation embedded, in this case, we have got a maximal VI-value 0.355 (see Table
VI-values of solutions (concept with single relation embedded).
|
|
|
|
Purity( |
( |
---|---|---|---|---|---|
|
|
|
Entropy( |
VI( | |
|
0.667 | 0.889 | 0.556 | 0.352 | 0.274 |
0.270 | 0.105 | 0.326 | 0.117 | ||
|
|||||
|
0.667 | 0.889 | 0.889 | 0.408 |
|
0.270 | 0.105 | 0.105 | 0.080 | ||
|
|||||
|
0.667 | 0.889 | 0.222 | 0.296 | 0.217 |
0.270 | 0.105 | 0.334 | 0.118 | ||
|
|||||
|
0.556 | 0.889 | 0.556 | 0.334 | 0.250 |
0.326 | 0.105 | 0.326 | 0.126 | ||
|
|||||
|
0.556 | 0.889 | 0.889 | 0.389 | 0.330 |
0.326 | 0.105 | 0.105 | 0.089 | ||
|
|||||
|
0.556 | 0.889 | 0.222 | 0.278 | 0.193 |
0.326 | 0.105 | 0.334 | 0.128 | ||
|
|||||
|
0.333 | 0.889 | 0.556 | 0.296 | 0.207 |
0.366 | 0.105 | 0.326 | 0.133 | ||
|
|||||
|
0.333 | 0.889 | 0.889 | 0.352 | 0.288 |
0.366 | 0.105 | 0.105 | 0.096 | ||
|
|||||
|
0.333 | 0.889 | 0.222 | 0.241 | 0.152 |
0.366 | 0.105 | 0.334 | 0.134 |
As mentioned in former section, a valid solution
We should only consider solution
VI-values of solutions (concept
|
|
|
|
Purity( |
( | |
---|---|---|---|---|---|---|
|
|
|
Entropy( |
VI( | ||
|
0.667 | 0.889 | 0.889 | 0.889 | 0.556 |
|
0.270 | 0.105 | 0.105 | 0.105 | 0.098 | ||
|
||||||
|
0.667 | 0.889 | 0.889 | 0.556 | 0.500 | 0.411 |
0.270 | 0.105 | 0.105 | 0.326 | 0.134 | ||
|
||||||
|
0.667 | 0.889 | 0.889 | 0.222 | 0.445 | 0.354 |
0.270 | 0.105 | 0.105 | 0.334 | 0.136 | ||
|
||||||
|
0.556 | 0.889 | 0.889 | 0.889 | 0.537 | 0.466 |
0.326 | 0.105 | 0.105 | 0.105 | 0.107 | ||
|
||||||
|
0.556 | 0.889 | 0.889 | 0.556 | 0.482 | 0.386 |
0.326 | 0.105 | 0.105 | 0.326 | 0.144 | ||
|
||||||
|
0.556 | 0.889 | 0.889 | 0.222 | 0.426 | 0.329 |
0.326 | 0.105 | 0.105 | 0.334 | 0.145 | ||
|
||||||
|
0.333 | 0.889 | 0.889 | 0.889 | 0.500 | 0.424 |
0.366 | 0.105 | 0.105 | 0.105 | 0.114 | ||
|
||||||
|
0.333 | 0.889 | 0.889 | 0.556 | 0.445 | 0.345 |
0.366 | 0.105 | 0.105 | 0.326 | 0.150 | ||
|
||||||
|
0.333 | 0.889 | 0.889 | 0.222 | 0.389 | 0.288 |
0.366 | 0.105 | 0.105 | 0.334 | 0.152 | ||
|
||||||
|
0.667 | 0.889 | 0.889 | 0.889 | 0.556 |
|
0.270 | 0.105 | 0.105 | 0.105 | 0.098 | ||
|
||||||
|
0.667 | 0.889 | 0.889 | 0.556 | 0.500 | 0.411 |
0.270 | 0.105 | 0.105 | 0.326 | 0.134 | ||
|
||||||
|
0.667 | 0.889 | 0.889 | 0.222 | 0.445 | 0.354 |
0.270 | 0.105 | 0.105 | 0.334 | 0.136 | ||
|
||||||
|
0.556 | 0.889 | 0.889 | 0.889 | 0.537 | 0.466 |
0.326 | 0.105 | 0.105 | 0.105 | 0.107 | ||
|
||||||
|
0.556 | 0.889 | 0.889 | 0.556 | 0.482 | 0.386 |
0.326 | 0.105 | 0.105 | 0.326 | 0.144 | ||
|
||||||
|
0.556 | 0.889 | 0.889 | 0.222 | 0.426 | 0.329 |
0.326 | 0.105 | 0.105 | 0.334 | 0.145 | ||
|
||||||
|
0.333 | 0.889 | 0.889 | 0.889 | 0.500 | 0.424 |
0.366 | 0.105 | 0.105 | 0.105 | 0.114 | ||
|
||||||
|
0.333 | 0.889 | 0.889 | 0.556 | 0.445 | 0.345 |
0.366 | 0.105 | 0.105 | 0.326 | 0.150 | ||
|
||||||
|
0.333 | 0.889 | 0.889 | 0.222 | 0.389 | 0.288 |
0.366 | 0.105 | 0.105 | 0.334 | 0.152 |
More examples with considering concept
VI-values of solutions (concept
|
|
|
|
Purity( |
( | |
---|---|---|---|---|---|---|
|
|
|
Entropy( |
VI( | ||
|
0.667 | 0.889 | 0.889 | 0.556 | 0.500 |
|
0.270 | 0.105 | 0.105 | 0.326 | 0.134 | ||
|
||||||
|
0.667 | 0.889 | 0.889 | 0.222 | 0.445 | 0.355 |
0.270 | 0.105 | 0.105 | 0.334 | 0.136 | ||
|
||||||
|
0.667 | 0.889 | 0.556 | 0.222 | 0.389 | 0.274 |
0.270 | 0.105 | 0.326 | 0.334 | 0.173 | ||
|
||||||
|
0.556 | 0.889 | 0.889 | 0.556 | 0.482 | 0.386 |
0.326 | 0.105 | 0.105 | 0.326 | 0.144 | ||
|
||||||
|
0.556 | 0.889 | 0.889 | 0.222 | 0.426 | 0.329 |
0.326 | 0.105 | 0.105 | 0.334 | 0.145 | ||
|
||||||
|
0.556 | 0.889 | 0.556 | 0.222 | 0.371 | 0.250 |
0.326 | 0.105 | 0.326 | 0.334 | 0.182 | ||
|
||||||
|
0.333 | 0.889 | 0.889 | 0.556 | 0.445 | 0.345 |
0.366 | 0.105 | 0.105 | 0.326 | 0.150 | ||
|
||||||
|
0.333 | 0.889 | 0.889 | 0.222 | 0.389 | 0.288 |
0.366 | 0.105 | 0.105 | 0.334 | 0.152 | ||
|
||||||
|
0.333 | 0.889 | 0.556 | 0.222 | 0.333 | 0.207 |
0.366 | 0.105 | 0.326 | 0.334 | 0.189 |
VI-values of solutions (concept
|
|
|
|
Purity( |
( | |
---|---|---|---|---|---|---|
|
|
|
Entropy( |
VI( | ||
|
0.667 | 0.556 | 0.889 | 0.556 | 0.445 | 0.331 |
0.270 | 0.326 | 0.105 | 0.326 | 0.171 | ||
|
||||||
|
0.667 | 0.556 | 0.889 | 0.889 | 0.500 |
|
0.270 | 0.326 | 0.105 | 0.105 | 0.134 | ||
|
||||||
|
0.667 | 0.556 | 0.889 | 0.222 | 0.389 | 0.274 |
0.270 | 0.326 | 0.105 | 0.334 | 0.173 | ||
|
||||||
|
0.667 | 0.333 | 0.889 | 0.556 | 0.408 | 0.289 |
0.270 | 0.366 | 0.105 | 0.326 | 0.178 | ||
|
||||||
|
0.667 | 0.333 | 0.889 | 0.889 | 0.463 | 0.372 |
0.270 | 0.366 | 0.105 | 0.105 | 0.136 | ||
|
||||||
|
0.667 | 0.333 | 0.889 | 0.222 | 0.352 | 0.233 |
0.270 | 0.366 | 0.105 | 0.334 | 0.179 | ||
|
||||||
|
0.333 | 0.556 | 0.889 | 0.556 | 0.389 | 0.264 |
0.366 | 0.326 | 0.105 | 0.326 | 0.187 | ||
|
||||||
|
0.333 | 0.556 | 0.889 | 0.889 | 0.445 | 0.345 |
0.366 | 0.326 | 0.105 | 0.105 | 0.150 | ||
|
||||||
|
0.333 | 0.556 | 0.889 | 0.222 | 0.333 | 0.207 |
0.366 | 0.326 | 0.105 | 0.334 | 0.189 |
Figure
The distribution of VI-value with different scales of semantic relations embedded.
We have not performed all the merge functions with corresponding VI-values in our illustrative example yet. The data and explanation we give above is just to show the proposed framework vividly. Now, we’ve got the conclusion that we could get a corresponding VI-value for any solution. On the other hand, we could use the VI-value to select the best solution. As our framework is flexible enough to generate shortest description containing all information found within the document sets with different levels of details depending on practical requirement. With this framework, neither the relations between the keywords nor the context of co-text will be lost. To generate a moderately fluent semantic fusion result from a collection of documents, sentence planning and regeneration are then used to combine the segments together to form a coherent whole. In our paper, the framework solved basic issues on developing a document fusion system.
We have presented a framework OMFM to map the fuzzy multisets of objects into one object. Our framework for merging multiple fuzzy multisets of documents is an interesting work, where a document set is modeled as a multiset of documents and each document is modeled as a fuzzy multiset of concepts. Also, OMFM is an extension of the work in paper [
With comparatively higher theoretical value and prospect of application, object merging problem will become a hot spot of research in many domains. Future work will further focus on experimental research and applying this framework to solving more relevant problems.
The authors declare that there is no conflict of interests regarding the publication of this paper.
This research has been supported by the National Natural Science Foundation of China (Grant no. 61472049), the National Natural Science Foundation for Young Scholars of China (Grant no. 61300148), and the Key Scientific and Technological Project of Jilin Province (Grant no. 20130206051GX). Besides, Dr. Lin Yue has been awarded a scholarship under the State Scholarship Fund to pursue her study at the University of Queensland as a joint Ph.D. Student, and this work also has been awarded by China Scholarship Council (CSC). Finally, The authors really appreciate the anonymous reviewers for their constructive comments which have made substantial improvements to this paper.